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(54) Efficient CELP vocoder and method. 

(57) The computational effort and time for CELP 
coding of speech is reduced by rearranging the 
recursive loop used to search vectors of an 
adaptive code book so that an impulse function 
of a short term perceptually weighted filter is 
first convolved with perceptually weighted 
target speech and the result cross-correlated 
with each code book vector to produce an error 
function for identifying the optimum adaptive 
codebook vector. In addition, autocorrelation is 
initially performed for only a small number of 
autocorrelation coefficients in each codebook 
vector and the values found are used to scan 
through all the vectors to find those giving a 
better match to input speech. Autocorrelation 
using all the vector values is then performed on 
this subset of vectors to identify the best vector 
for representing the frame of speech. An end 
correction procedure is used for vectors shorter 
than the speech frame length to avoid copy-up 
errors common in the prior art. An improved 
means and method for obtaining correlation 
coefficients for the stochastic codebook vec- 
tors is also described.. 




Q. 

LU 



Jouve, 18, rue Saint-Denis, 75001 PARIS 



BP 0 516 439 A2 



CO 



o 

o . 



o _ 



CM 



CM 



X 



UJ ^ cc 

> 8 m 

Jr m o 
5^ uj cc 

§ o a 

< O CO 



CM-s 

CM 



8 



UJ 
> 



fir » 

< Q 

go 

< o 



LU CC 
Zuj 

5 Q 
< o 

5° 



-J CC 
LU ill 

z o 

3 ° 
< o 
n: uj 
o o 



CD 
*CM 
CvJ 



CO 
CM 

CNJ 



STOCHASTIC 
CODEBOOK 
SEARCHER g25 


CO 


STOCHASTIC 
CODEBOOK 

m. 





CO 



CO 
CM 
CM ■ 



CO 
CM 



CM 



CM 

J 

CM 
CM 



— « — * 



CC 

O 
i- 
o 
< 

QC 

5 

CO 







CO 




CO 




CM 




( 









111 < 

CO rr 



O 



UJ 



UJ 



CO 

us 



O OC K 

cc O 5 
O ^ uj 



CO 



CO 



OC 



CC 
UJ 



CO 



CO 



>■ 

3 § 

LU _ £ 

O h Q 
Z £ UJ 

o cc 
_i a. 



CM 



J 



co ; 

o 



o 

CM 



7 



2 




EPO 516 439 A2 



Field of the Invention 

The present invention concerns an improved means and method for digital coding of speech or other analog 
signals and, more particularly, code excited linear predictive coding. 

5 

Background of the Invention 

Code Excited Linear Predictive (CELP) coding is a well-known stochastic coding technique for speech com- 
munication. In CELP coding, the short-time spectral and long-time pitch are modeled by a set of time-varying 

10 linear filters. In a typical speech coder based communication system speech is sampled by an AID converter 
at approximately twice the highest frequency desired to be transmitted, e.g., an 8 KHz sampling frequency is 
typically used for a 4 KHz voice bandwidth. CELP coding synthesizes speech by utilizing encoded excitation 
information to excite a linear predictive (LPC) filter. The excitation, which is used as inputs to the filters, is mod- 
eled by a code book of white Gaussian signals. The optimum excitation is found by searching through a code- 
rs book of candidate excitation vectors on a frame-by-frame basis. 

LPC analysis is performed on the input speecn frame to determine the LPC parameters. Then the analysis 
proceeds by comparing the output of the LPC f ilter with the digitized input speech, when the LPC filter is excited 
by various candidate vectors from the table, i.e., the code book. The best candidate vector is chosen based 
on how well speech synthesized using the candidate excitation vector matches the input speech. This is usually 

20 performed on several subframes of speech. 

After the best match has been found, information specifying the best codebook entry, the LPC filter coef- 
ficients and the gain coefficients are transmitted to the synthesizer. The synthesizer has the same copy of the 
codebook and accesses the appropriate entry in that codebook, using it to excite the same LPC filter. 

The codebook is made up of vectors whose components are consecutive excitation samples. Each vector 

25 contains the same number of excitation samples as there are speech samples in the subframe or frame. The 
excitation samples can come from a number of different sources. Long term pitch coding is determined by the 
proper selection of a code vector from an adaptive codebook. The adaptive codebook is a set of different pitch 
periods of the previously synthesized speech excitation waveform. 

The optimum selection of a code vector, either from the stochastic or the adaptive codebooks, depends 

30 on minimizing the perceptually weighted error function. This error function is typically derived from a comparison 
between the synthesized speech and the target speech for each vector in the codebook. These exhaustive com- 
parison procedures require a large amount of computation and are usually not practical for a single Digital Signal 
Processor (DSP) to implement in real time. The ability to reduce the computation complexity without sacrificing 
voice quality is important in the digital communications environment 

35 The error function, codebook vector search, calculations are performed using vector and matrix operations 

of the excitation information and the LPC filter. The problem is that a large number of calculations, for example, 
approximately 5 x 10 8 muitiply-add operations per second for a 4.8 Kbps vocoder, must be performed. Prior 
art arrangements have not been entirely successful in reducing the number of calculations that must be per- 
formed. Thus, a need continues to exist for improved CELP coding means and methods that reduce the com- 

40 putational burden without sacrificing voice quality. 

A prior art 4.8k bit/second CELP coding system is described in Federal Standard FED-STD-1016 issued 
by the General Services Administration of the United States Government. Prior art CELP vocoder systems are 
described for example in U. S. Patents 4,899,385 and 4,91 0,781 to Ketchum et al., 4,220,81 9 to Atal, 4,797,925 
to Lin, and 4,817,157 to Gerson, which are incorporated herein by reference. 

45 Typical prior art CELP vocoder systems use an 8 kHz sampling rate and a 30 millisecond frame duration 

divided into four 7.5 millisecond subframes. Prior art CELP coding consists of three basic functions: (1) short 
delay "spectrum" prediction, (2) long delay "pitch" search, and (3) residual "code book" search. 

While the present invention is described for the case of analog signals representing human speech, this 
is merely for convenience of explanation and, as used herein, the word "speech" is intended to include any 

so form of analog signal of bandwidth within the sampling capability of the system. 



SUMMARY OF THE INVENTION 

The present invention provides an improved means and method which substantially reduces the compu- 
55 tational burden of CELP coding speech based on adaptive and stochastic codebooks. 

In a first embodiment, a recursive calculation loop is used to poll vectors of the adaptive codebook to select 
the optimal excitation vector therefrom. In a preferred embodiment, an impulse function of a short term percep- 
tually weighted filter is convolved with perceptually weighted target speech and the result cross-correlated with 
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each vector in the adaptive codebook and combined with au to-correlated codebook vectors and auto-correlated 
impulse functions to produce an error function. The adaptive codebook vector having the minimum error func- 
tion is chosen to represent the particular speech frame (or subframe) being examined. 

An additional embodiment further simplifies the recursive loop for the adaptive code book by reducing the 

5 number of autocorrelation operations that must be performed on the K vectors of the adaptive codebook each 
of which has N entries. Autocorrelation is initially performed for only a small number P « N autocorrelation 
coefficients in each codebook vector and the values found are used to scan through all the K codebook vectors 
looking for those S of the K codebook vectors (S « K) which give the best match to the input speech. The 
autocorrelation function for the S vectors is then recalculated for R autocorrelation coefficients (P < R ^ N) and 

10 the S codebook vectors re-evaluated to determine which of the S adaptive codebook vectors gives the best 
match to the input speech. 

A yet additional embodiment further reduces the number of calculations that must be carried out to deter- 
mine autocorrelation coefficients when codebook vectors of length M less than the frame length L are being 
evaluated. Autocorrelation coefficients U k (m) of a first vector C^n) of length M < L are calculated, where k = 1 

15 and m is an autocorrelation lag index and n is an index of the successive samples in the codebook vector and 
L is the analysis frame length, according to, 

M 

U'^m^X [C l (n)Ci(n+m)] 

n-1 (1) 

Ui(m) = (^)U'i(m) (2) 

25 for m = 0 to T < M. Autocorrelation coefficients U k (m) of the remaining codebook vectors are calculated incre- 
mentally where k ^ 2 according to, 

U' k (m) = [U' k . ,(m) + C k (M + k - 1)C k (M + k - 1+ m)] (3) 

Uk(m) = < M Jk-1 )U ' k(m) (4) 

30 for m = 0 to T < M and the process repeated until (M+k-1) = L. It is preferred that T = M-1 . The values of LVfm) 
and U k '(m) obtained are scaled by the indicated scaling factors, e.g., (UM) for k=1, L/(M+1) for k=2, and so 
forth untl (M+k-1) = L. The autocorrelation coefficients obtained are used in determining which of the codebook 
vectors C k (n) produces the least error when compared to input speech. 

In a still further embodiment, a means and method is provided for more quickly and easily determining the 

35 correlation coefficients of vectors of the stochastic codebook with other vectors generated by the CELP coding 
process in order to identify the optimum stochastic codebook vector for replicating the target speech. In more 
detail, a first vector V(n) having values identified by index n running from n=1 to N, and a set of the second 
vectors S k (n) wherein each of the second vectors is identified by index k and wherein each of the second vectors 
has up to N values which are either zero or non-zero and are identified by index n from n=1 to N are combined 

40 by identifying indices n Kl of S k (n) for different k wherein S k (n,) are non-zero, adding values of the V(n) corre- 
sponding to indices n^ to form sums Q(k), identifying the value k=j corresponding to the largest value Q(k=j), 
and synthesizing speech using S^n). 

In a preferred embodiment, successive vectors of the set of second vectors are determined by overlap of 
the preceding second vector according to an overlap amount Ak,An, and the identifying and adding steps com- 

45 prise, identifying for k=1 indices n 1r! of S k (n) wherein Si(ni) are non-zero, starting from n 1r) and using the overlap 
amount Ak,An, determining further indices n KV for k>1 wherein S k (np) are non-zero and adding values of the 
V(n) for such indices and further indices to form sums Q(k). 

In a yet further embodiment, an N by N multiplexer having n=1 to N outputs, n=1 to N first inputs, second 
inputs, and n=1 to N select means is used to combine codebook vectors. A first logic level presented to the n* 1 

so select means couples the n" 1 output to the n^ first input and a second logic level presented to the n m select 
means couples the n m output to the second input. The second input is conveniently a predetermined logic level. 
The n=1 to N values of the first vector are supplied to the n=1 to N first inputs of the multiplexer and the n=1 
to N values of the second vector of index k=1 to n=1 to N are supplied to the n=1 to N select means of the 
multiplexer. The second vector provides at the n=1 to N select means the first logic level for some values of n 

55 and the second logic level for other values of n. The values of the first vector appearing at the multiplexer output 
are added in an accumulator to provide a sum. The presenting and adding steps are repeated for further values 
of k and speech synthesized based on whichever second vector has the sum giving the closest match to target 
speech. 
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In a preferred embodiment, each second vector is divided into two portion, a first portion having values 0, 
1 corresponding to the location of values of 0, +1 of the second vector and a second portion having values 0, 
1 corresponding to the location of values 0, -1 of the second vector. Each portion is desirably stored in a ROM. 
Associated with each portion is a multiplexer such as is described above and operating in the same fashion, 
5 each multiplexer providing an accumulated output sum based on the 0, 1 values of its associated portion of 
the second vector. The presenting and adding steps are repeated for each ROM-multiplexer combination for 
each value of k as described above and the sums resulting from each combined in a further adder (or subtractor) 
to provide a combined output for use in selecting which vector gives the closest match to the target speech. 

10 BRIEF DESCRIPTION OF THE DRAWING 

FIG. 1 illustrates in simple block diagram and generalized form a CELP vocoder system; 

FIGS. 2A-B illustrates, in simplified block diagram form, a CELP coder according a preferred embodiment 

of the present invention; 

15 FIG. 3 illustrates, in greater detail, a portion of the coder of FIG. 2B. according to a first embodiment; 

FIG. 4 illustrates, in greater detail, a portion of the coder of FIG. 2B, according to a preferred embodiment 
of the present invention; 

FIG. 5 illustrates an apparatus for providing autocorrelation coefficients of the adaptive code book vectors 
according to a preferred embodiment of the present invention; 
20 FIG. 6 illustrates the content of a small stochastic codebook of a type used for CELP coding; 

FIG. 7 is a sim plffied block diagram of a cross-correlation function according to the present invention; 
FIG. 8 is a schematic diagram showing further details of the multiplexers used in FIG. 7; and 
FIGS. 9-10 illustrate the content of first and second memory means whose entries correspond to non-zero 
entries of the codebook of FIG. 6. 

25 

DETAILED DESCRIPTION 

FIG. 1 illustrates, in simplified block diagram form, a vocoder transmission system utilizing CELP coding. 
CELP coder 100 receives incoming speech 102 and produces CELP coded output signal 104. CELP coded 

30 signal 104 is sent via transmission path or channel 106 to CELP decoder 300 where facsimile 302 of original 
speech signal 102 is reconstructed by synthesis. Transmission channel 106 may have any form, but typically 
is a wired or radio communication link of limited bandwidth. CELP coder 100 is frequently referred to as an 
"analyzer" because its function is to determine CELP code parameters 104 (e.g., code book vectors, gain in- 
formation, LPC filter parameters, etc) which best represent original speech 1 02. CELP decoder 300 is frequent- 

35 ly referred to as a synthesizer because its function is to recreate output synthesized speech 302 based on in- 
coming CELP coded signal 104. CELP decoder 300 is conventional and is not a part of the present invention 
and will not be discussed further. 

FIGS. 2A-B show CELP coder 100 in greater detail and according to a preferred embodiment of the present 
invention. Incoming analog speech signal 1 02 is first band-passed by filter 1 1 0 to prevent aliasing. Band-passed 

40 analog speech signal 1 1 1 is then sampled by analog to digital (A/D) converter 112. Sampling is usually at the 
Nyquist rate, for example at 8 KHz for a 4 KHz CELP vocoder. Other sampling rates may also be used. Any 
suitable A/D converter may be used. Digitized signal 1 13 from A/D converter 112 comprises a train of samples, 
e.g., a train of narrow pulses whose amplitudes correspond to the envelop of the speech waveform. 

Digitized speech signal 1 13 is then divided into frames or blocks, that is, successive time brackets con- 

45 taining a predetermined number of digitized speech samples, as for example, 60, 1 80 or 240 samples per frame. 
This is customarily referred to as the "frame rate" in CELP processing. Other frame rates may also be used. 
This is accomplished in framer 1 14. Means for accomplishing this are well known in the art. Successive speech 
frames 115 are stored in frame memory 116. Output 1 1 7 of frame memory 116 sends frames 1 1 7 of d igitized 
speech 1 15 to blocks 122, 142, 162 and 235 whose function will be presently explained. 

so Those of skill in the art understand that frames of digitized speech may be further divided into subframes 

and speech analysis and synthesis performed using subframes. As used herein, the word "frame", whether sin- 
gular or plural, is intended to refer to both frames and subframes of digitized speech. 

CELP coder 100 uses two code books, i.e., adaptive codebook 155 and stochastic codebook 180 (see FIG. 
2B). For each speech frame 115, coder 100 calculates LPC coefficients 123 representing the formant charac- 

55 teristics of the vocal tract Coder 100 also searches for entries (vectors) from both stochastic codebook 180 
and adaptive codebook 155 and associated scaling (gain) factors that, when used to excite a filter with LPC 
coefficients 123, best approximates input speech frame 117. The LPC coefficients, the codebook vectors and 
the scaling (gain coefficient) information are processed and sent to channel coder 21 0 where they are combined 
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to form coded CELP signal 104 which is transmitted by path 106 to CELP decoder 300. The process by which 
this is done will now be explained in more detail. 

Referring now to data path 121 containing blocks 122, 125. 130 and 135, LPC analyzer 122 is responsive 
to incoming speech frames 117 to determine LPC coefficients 123 using well-known techniques. LPC coeffi- 

5 cients 123 are in the form of Line Spectral Pairs (LSPs) or Line Spectral Frequencies (LSFs), terms which are 
well understood in the art LSPs 123 are quantized by coder 125 and quantized LPC output signal 126 sent to 
channel coder 210 where it forms a part (i.e., the LPC filter coefficients) of CELP signal 104 being sent via trans- 
mission channel 106 to decoder 300. 

Quantized LPC coefficients 126 are decoded by decoder 1 30 and the decoded LSPs sent via output signals 

10 131, 132 respectively, to spectrum inverse filters 145 and 170. which are described in connection with data 
paths 141 and 161, and via output signal 133 to bandwidth expansion weighting generator 135. Signals 131, 
132 and 133 contain information on decoded quantized LPC coefficients. Means for implementing coder 125 
and decoder 130 are well known in the art. 

Bandwidth expansion weighting generator 135 provides a scahng factor (typically « 0.8) and performs the 

15 function of bandwidth expansion of the formants, producing output signals 136, 1 37 containing information on 
bandwidth expanded LPC filter coefficients. Signals 136, 137 are sent respectively, to cascade weighting filters 
150 and 175 whose function will be explained presently. 

Referring now to data path 141 containing blocks 142, 145 and 150, spectral predictor memory subtracter 
142 subtracts previous states 196 (i.e., left by the immediately preceding frame) in short term spectrum predictor 

20 filter 195 (see FIG. 2B) from input sampled speech 1 15 arriving from frame memory 1 16 via 117. Subtracter 
142 provides speech residual signal 143 which is digitized input speech 115 minus what is referred to in the 
art as the filter ringing signal or the filter ringdown. The filter ringing signal arises because an impulse used to 
excite a filter (e.g., LPC filter 195 in FIG. 2B) in connection with a given speech frame does not completely dis- 
sipate by the end of that frame, but may cause filter excitation (i.e., "ringing") extending into a subsequent frame. 

25 This ringing signal appears as distortion in the subsequent frame, since it is unrelated to the speech content 
of that frame. If the ringing signal is not removed, it affects the choice of code parameters and degrades the 
quality of the speech synthesized by decoder 300. 

Speech residual signal 143 containing information on speech 115 minus filter ringing signal 196 is fed into 
spectrum inverse filter 145 along with signal 131 from decoder 130. Filter 145 is typically implemented as a 

30 zero filter (i.e. A(z) = Ao + AtZ" 1 + ... + A„z- n where the A's are LPC filter coefficients and z is "Z transform" of 
the filter), but other means well known in the art may also be used. Signals 131 and 143 are combined in filter 
145 by convolution to create LPC inverse-filtered speech. Output signal 146 of filter 145 is sent to cascade 
weighting filter 150. Filter 150 is typically implemented as a pole filter (i.e., 1/A(z/r), where A(z/r) = Ao + 
A,^ 1 + ... + Ani^z-", and the A's are LPC filter coefficients and r is an expansion factor and z is "Z transform 1 * 

35 of the filter), but other means well known in the art may also be used. 

Output signal 1 52 from block 1 50 is perceptually weighted LPC impulse function H(n) derived from the con- 
volution of an impulse function (e.g., 1 , 0, 0, ... , 0) with bandwidth expanded LPC coefficient signal 136 arriving 
from block 135. Signal 136 is also combined with signal 146 in block 150 by convolution to create at output 
151, perceptually weighted short delay target speech signal X(n) derived from path 141. 

40 Outputs 151 and 152 of weighting filter 150 are fed to adaptive codebook searcher 220. Target speech 

signal 151 (i.e., X(n)) and perceptually weighted impulse function signal 152 (i.e., H(n)) are used by the searcher 
220 and adaptive codebook 155 to determine the pitch period (i.e., the excitation vector for filter 195) and the 
gain therefore which most closely corresponding to digitized input speech frame 117. The manner in which this 
is accomplished is explained in more detail in connection with FIGS. 3-4 

45 Referring now to data path 161 which contains blocks 162, 165, 170 and 175, pitch predictor memory sub- 

tractor 162 subtracts previous filter states 192 in long delay pitch predictor filter 190 from digitized input sampled 
speech 115 received from memory 116 via 117 to give output signal 163 consisting of sampled speech minus 
the ringing of long delay pitch predictor filter 190. Output signal 163 is fed to spectrum predictor memory sub- 
tractor 165. 

50 Spectral memory subtracter 165 performs the same function as described in connection with block 142 and 

subtracts out short delay spectrum predictor ("spectral") filter ringing or ringdown signal 196 from digitized input 
speech frame 117 transmitted via pitch subtracter 162. This produces remainder output signal 166 consisting 
of current frame sampled speech 117 minus the ringing of long delay ("pitch") filter 1 90 and short delay ("spec- 
tral") filter 195 left overfrom the previous frame. Remainder signal 166 is fed to spectrum inverse filter 170 which 

55 is analogous to block 145. 

Inverse filter 170 receives remainder signal 166 and output132 of decoder 1 30. Signal 132 contains infor- 
mation on decoded quantized LPC coefficients. Filter 170 combines signals 166 and 132 by convolution to cre- 
ate output signal 171 comprising LPC inverse-filtered speech. Output signal 171 is sent to cascade weighting 
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fitter 175 analogous to block 150. 

Weighting filter 175 receives signal 171 from filter 170 and signal 137 from bandwidth expansion weighting 
generator 135. Signal 137 contains information on bandwidth expanded LPC coefficients. Cascade weighting 
filter 175 produces output signals 176, 177. Filter 175 is typically implemented as a pole filter (i.e. only poles 
5 in the complex plane), but other means well known in the art may also be used. 

Signals 137, 171 are combined in filter 175 by convolution to create at output 177, perceptually weighted 
LPC impulse function H(n) derived from path 121, and create at output 176, perceptually weighted long delay 
and short delay target speech signal Y(n) derived from path 161. Output signals 176, 177 are sent to stochastic 
searcher 225. 

10 Stochastic searcher 225 uses stochastic code book 1 80 to select an optimum white noise vector and a op- 

timum scaling (gain) factor which, when applied to pitch and LPC filters 190, 195 of predetermined coefficients, 
provide the best match to input digitized speech frame 117. Stochastic searcher 225 performs operations well 
known in the art and generally analogous to those performed by adaptive searcher 220 described more fully 
in connection with FIGS. 3-4. 

15 In summary, in chain 141, spectrum inverse filter 145 receives LSPs 131 and residual 143 and sends its 

output 146 to cascade weighting filter 150 to generate perceptually weighted LPC impulse function response 
H(n) at output 152 and perceptually weighted short delay target speech signal X(n) at output 151. In chain 161, 
spectrum inverse filter 170 receives LSPs 132 and short delay and long delay speech residual 166, and sends 
its output 171 to weighting filter 175 to generate perceptually weighted LPC impulse function H(n) at output 

20 1 77 and perceptually weighted short and long term delay target speech signal Y(n) at output 176. 

Blocks 135, 150, 175 collectively labelled 230 provide the perceptual weighting function. The decoded 
LSPs from chain 121 are used to generate the bandwidth expand weighting factor at outputs 136, 137 in block 
135. Weighting factors 136, 137 are used in cascade weighting filters 150 and 175 to generate perceptually 
weighted LPC impulse function H(n). The elements of perceptual weighting block 230 are responsive to the 

25 LPC coefficients to calculate spectral weighting information in the form of a matrix that emphasizes those por- 
tions of speech that are known to have important speech content This spectral weighting information 1/A(z/r) 
is based on finite impulse response H(n) of cascade weighting filters 150, and 175. The utilization of finite im- 
pulse response function H(n) greatly reduces the number of calculations which codebook searchers 220 and 
225 must perform. The spectral weighting information is utilized by the searchers in order to determine the best 

30 candidate for the excitation information from the code books 155 and 180. 

Continuing to refer to FIGS. 2A-B, adaptive codebook searcher 220 generates optimum adaptive codebook 
vector index 221 and associated gain 222 to be sent to channel coder 210. Stochastic codebook searcher 225 
generates optimum stochastic codebook vector index 226, and associated gain 227 to be sent to channel coder 
210. These signals are encoded by channel coder 210. 

35 Channel coder 210 receives five signals: quantized LSPs 126 from coder 125, optimum stochastic code- 

book vector index 226 and gain setting 227 therefore, and optimum adaptive codebook vector index 221 and 
gain setting 222 therefore. The output of channel coder 210 is serial bit stream 104 of the encoded parameters. 
Bit stream 1 04 is sent via channel 1 06 to CELP decoder 300 (see FIG. 1) where, after decoding, the recovered 
LSPs, codebook vectors and gain settings are applied to identical filters and codebooks to produce synthesized 

40 speech 302. 

As has already been explained, CELP coder 100 determines the optimum CELP parameters to be trans- 
mitted to decoder 300 by a process of analysis, synthesis and comparison. The results of using trial CELP para- 
meters must be compared to the input speech frame by frame so that the optimum CELP parameters can be 
selected. Blocks 190, 195, 197, 200, 205, and 235 are used in conjunction with the blocks already described 
45 in FIGS. 2A-B to accomplish this. The selected CELP parameters (LSP coefficients, codebooks vectors and 
gain, etc.) are passed via output 211 to decoder 182 from whence they are distributed to blocks 190, 195, 197, 
200, 205, and 235 and thence back to blocks 142, 145, 150, 162, 165, 170 and 175 already discussed. 

Block 182 is identified as a "channel decoder" having the function of decoding signal 211 from coder 210 
to recover signals 126, 221, 222, 226, 227. However, those of skill in the art will understand that the code- 
so decode operation indicated by blocks 210-182 may be omitted and signals 126, 221, 222, 226, 227 fed in un- 
coded form to block 1 82 with block 1 82 merely acting as a buffer for distributing the signals to blocks 1 90, 1 95, 
197, 200, 205, and 235. Either arrangement is satisfactory, and the words "channel coder 182", "coder 
182* or "block 182" are intended to indicate either arrangement or any other means for passing such informa- 
tion. 

55 The output signals of decoder 182 are quantized LSP signal 126 which is sent to block 195, adaptive co- 

debook index signal 221 which is sent to block 190, adaptive codebook vector gain index signal 222 which is 
sent to block 190, stochastic codebook index signal 226 which is sent to block 180, and stochastic codebook 
vector gain index signal 227 which is sent to block 1 97. These signals excite filter 190 thereby producing output 
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191 which is fed to to adaptive codebook 155 and to filter 195. Output 191 in combination with output 126 of 
coder 182, further excites filter 195 to produce synthesized speech 196. 

Synthesizer 228 comprises gain multiplier 197, long delay pitch predictor 190, and short delay spectrum 
predictor 1 95, subtracter 235, spectrum inverse filter 200 and cascade weighting filter 205. Using the decoded 
5 parameters 126, 221, 222, 226 and 227, stochastic code vector 179 is selected and sent to gain multiplier 197 
to be scaled by gain parameter 226. Output 1 98 of gain multiplier 1 97 is used by long delay pitch predictor 1 90 
to generate speech residual 191. Filter state output information 192, also referred to in the art as the speech 
residual of predictor filter 1 90, is sent to pitch memory subtracter 1 62 for filter memory update. Short delay spec- 
trum predictor 195, which is an LPC filter whose parameters are set by incoming LPC parameter signal 126, 
10 is excited by speech residual 191 to produce synthesized digital speech output 196. The same speech residual 
signal 191 is used to update adaptive codebook 155. 

Synthesized speech 1 96 is subtracted from digitized input speech 1 1 7 by subtracter 235 to produce digital 
speech remainder output signal 236. Speech remainder 236 is fed to the spectrum inverse filter 200 to generate 
resiOual error signal 202. Output signal 202 is fed to the cascade weighting filter 205, and output fater state 
is information 206, 207 is used to update cascade weighting filters 150 and 175 as previously described in con- 
nection with signal paths 141 and 1 61 . Output signal 201 , 203, which is the filter state information of spectrum 
inverse filter 200, is used to update the spectrum inverse filters 145 and 170 as previously described in con- 
nection with blocks 145, 170. 

FK5S. 3-4 are simpJrfied block diagrams of adaptive codebook searcher 220. FIG. 3 shows a suitable ar- 
20 rangement for adaptive codebook searcher 220 and FIG. 4 shows a further improved arrangement. The ar- 
rangement of FIG. 4 is preferred. 

Referring now to FIGS. 3-4 generally, the information in adaptive codebook 155 is excitation information 
from previous frames. For each frame, the excitation information consists of the same number of samples as 
the sampled original speech. Codebook 155 is conveniently organized as a circular list so that a new set of 
25 samples is simply shifted into codebook 155 replacing the earliest samples presently in the codebook. The new 
excitation samples are provided by output 191 of long delay pitch predictor 190. 

When utilizing excitation information out of codebook 155, searcher 220 deals in sets, i.e., subframes and 
does not treat the vectors as disjointed samples. Searcher 220 treats the samples in codebook 1 55 as a linear 
array. For example, for 60 sample frames, searcher 220 forms the first candidate set of information by utilizing 
30 samples 1 through sample 60 from codebook 155, and the second set of candidate information by using sam- 
ples 2 through 61 and so on. This type of codebook searching is often referred to as an overlapping codebook 
search. The present invention is not concerned with the structure and function of codebook 1 55, but with how 
codebook 155 is searched to identify the optimum codebook vector. 

Adaptive codebook searcher 220 accesses previously synthesized pitch information 156 already stored in 
35 adaptive codebook 155 from output 191 in FIG. 2B, and utilizes each such set of information 156 to minimize 
an error criterion between target excitation 151 received from block 150 and accessed excitation 156 from co- 
debook 155. Scaling factor or gain index 222 is also calculated for each accessed set of information 156 since 
the information stored in adaptive codebook 155 does not allow for the changes in dynamic range of human 
speech or other input signal. 

40 The preferred error criterion used is the Minimum Squared Prediction Error (MPSE), which is the square 

of the difference between the original speech frame 1 15 from frame memory output 117 and synthetic speech 
196 produced at the output of block 195 of FIG. 2B. Synthetic speech 196 is calculated in terms of trial excitation 
information 156 obtained from the codebook 155. The error criterion is evaluated for each candidate vector or 
set of excitation information 156 obtained from codebook 155, and the particular set of excitation information 

45 156 P giving the lowest error value is the set of information utilized for the present frame (or subframe). 

After searcher 220 has determined the best match set of excitation information 156' to be utilized along 
with a corresponding best match scaling factor or gain 222', vector index output signal 221 corresponding to 
best match index 1 56' and scaling factor 222 corresponding to the best match scaling factor 222' are transmitted 
to channel encoder 210. 

so FIG. 3 shows a block diagram of adaptive searcher 220 according to a first embodiment and FIG. 4 shows 

adaptive searcher 220' according to a further improved and preferred embodiment. Adaptive searchers 220, 
220' perform a sequential search through the adaptive codebook 155 vectors indices C^n) ... C K (n). During 
the sequential search operation, searchers 220, 220' accesses each candidate excitation vector C k (n) from the 
codebook 155 where k is an index running from 1 to K identifying the particular vector in the codebook and 

55 where n is a further index running from n=1 to n=N where N is the number of samples within a given frame. In 
a typical CELP application K = 256 or 512 or 1024 and N = 60 or 120 or 240, however, other values of K and 
N may also be used. 

Adaptive codebook 155 contains sets of different pitch periods determined from the previously synthesized 
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speech waveform. The first sample vector starts from the Nth sample of the synthesized speech waveform C k (N) 
which is located from the current last sample of the synthesized speech waveform back N samples, tn human 
voice, the pitch frequency is generally around 40 Hz to 500 Hz. This translates to about 200 to 16 samples. If 
fractional pitch is involved in the calculation, K can be 256 or 51 2 in order to represent the pitch range. Therefore, 

5 the adaptive codebook contains a set of K vectors C k (n) which are basically samples of one or more pitch per- 
iods of a particular frequency. 

Referring now to FIG. 3, convolution generator 510 of adaptive codebook searcher 220 convolves each 
codebook vector C k (n), i.e., signal 156, with perceptually weighted LPC impulse response function H(n), i.e., 
signal 1 52 from cascade weighted filter 1 50. Output 512 of convolution generator 51 0 is then cross-correlated 

10 with target speech residual signal X(n) (i.e., signal 151 of FIGS. 2A-B) in cross-correlator 520. The convolution 
and correlation are done for each codebook vector C k (n) where n = 1, .... N. The operation performed by con- 
volution generator 510 is expressed mathematically by equation (1) below: 



n 

15 Z k (n) = X C k (m)H(n-m+l). n = 1, - • -,N 

m=l (1 ) 

The operation performed by cross correlation generator 520 is expressed mathematically by equation (2) below: 

20 

I Zk(n)X(n) n=l,...,N 

„=i (2) 

25 

Output 512 of convolution generator 510 is also fed to energy calculator 535 comprising squarer 552 and ac- 
cumulator 553 (accumulator 553 provides the sum of the squares determined by squarer 552). Output 554 is 
delivered to divider 530 which calculates the ratio of signals 551 and 554. Output 521 of cross-correlator 520 
is fed to squarer 525 whose output 55 1 is also fed to divider 530. Output 531 of divider 530 is fed to peak selector 
30 circuit 570 whose function is to determine which value Cv(m) of C k (n) produces the best match, i.e., the greatest 
cross-correlation. This can be expressed mathematically by equations (3a) and (3b). Equation (3a) expresses 
the error E. 



N n 

35 E = X 2 (n) - G k [X X(n)[ £ C k (m)H(n-m+l)]] 

n = l m=l (3a) 

To minimize error E is to maximize the cross-correlation expressed by equation (3b) below, where G k is defined 
40 by equation (4): 



N n 

G k [£ X(n)[£ C k (m)H(n-m+l)]] 

n=l m=l (3b) 

45 

The identification (index) of the optimum vector index C k (m) is delivered to output 221. Output 571 of peak se- 
lector 570 carries the gain scaling information associated with best match pitch vector C k (m) to gain calculator 
580 which provides gain index output 222. The operation performed by gain calculator 580 is expressed math- 
so ematically by equation (4) below. 



55 Gfc = 



N n 

X X(n)[£ C k (m)H(n-m+l)] 



N n 

X tX C k (m)H(n-m+l)P 

n=l m=l ^ (4) 
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Outputs 221 and 222 are sent to channel coder 210. Means for providing convolution generator 510, cross- 
correlation generator 520, squarers 525 and 552 (which perform like functions on different inputs), accumulator 
553, divider 530, peak selector 570 and gain calculator 580 are individually well known in the art. 

While the arrangement of FIG. 3 provides satisfactory results it requires more computations to perform the 
5 necessary convolutions and correlations on each code book vector than are desired. This is because convolu- 
tion 510 and correlation 520 must both be performed on every candidate vector in code book 155 for each 
speech frame 117. This limitation of the arrangement of FIG. 3 is overcome with the arrangement of FIG. 4. 

Adaptive codebook searcher 220' of FIG. 4 uses a frame of perceptually weighted target speech X(n) (i.e., 
signal 151 of FIG. 2A-B) to convolve with the impulse perceptually weighted response function H(n) of a short 
10 term LPC filter (i.e., output 152 of block 150 of FIG. 2) in convolution generator 510/ to generate convolution 
signal W(n). This is done only once per frame 1 1 7 of input speech. This immediately reduces the computational 
burden by a large factor approximately equal to the number of candidate vectors in the codebook. This is a 
very substantial computational saving. The operation performed by convolution generator 510' is expressed 
mathematically by equation (5) below: 

15 

n 

W(n) = X X(m)H(n-m+l). n = 1,- -,N 

m=l (5) 

Output 512' of convolution generator 510' is then correlated with each vector C k (n) in adaptive codebook 155 
by cross-correlation generator 520'. The operation performed by cross correlation generator 520' is expressed 
mathematically by equation (6) below: 

N 

£w(n)C w (n), n = l,--.,N 

»=i (6) 

30 Output 551' is squared by squarer 525' to produce output 521' which is the square of the correlation of 

each vector C k (n) normalized by the energy of the candidate vector C k (n). This is accomplished by providing 
each candidate vector C k (n) (output 156) to auto-correlation generator 560' and by providing filter impulse re- 
sponse H(n) (from output 1 52) to auto-correlation generator 550' whose outputs are subsequently manipulated 
and combined. Output 552' of auto-correlation generator 550' is fed to look-up table 555' whose function is 

35 explained later. Output 556' of table 555' is fed to multiplier 543' where it is combined with output 561' of auto- 
correlator 560'. 

Output 545' of multiplier 543' is fed to accumulator 540' which sums the products for successive values of 
n and sends the sum 541' to divider 530' where it is combined with output 521' of cross-correlation generator 
520'. The operation performed by auto-correlator 560' is described mathematically by equation (7) and the op- 
40 eration performed by auto-correlator 550' is described mathematically by equation (8) 



N 

U k (m) = X [C k (n)C k (m+n)], m=0, .... N-l 
*5 „=] (7) 



N 

0(m) = X [H(n)H(m+n)], m=0 N-l 

n=l (8) 

where, 

C k (n) is the k" 1 adaptive code book vector, each vector being identified by the index k running from 1 to 

H(n) is the perceptually weighted LPC impulse response, 

N is the number of digitized samples in the analysis frame, and 

m is a dummy integer index and n is the integer index indicating which of the N samples within the speech 
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frame is being considered. 

The search operation compares each candidate vector C k (n) with the target speech residual X(n) using 
MSPE search criteria. Each candidate vector C k (n) received from output 156 of codebook 155 is sent to auto- 
correlation generator 560' which generates all autocorrelation coefficients of the candidate vector to produce 

5 autocorrelation output signal 561' which is fed to energy calculator 535' comprising blocks 543' and 540'. 

Autocorrelation generator 550/ generates all the autocorrelation coefficients of the H(n) function to produce 
autocorrelation output signal 552' which is fed to energy calculator 535' through table 555' and output 556'. 

Energy calculator 535' combines input signals 556' and 561* by summing all the product terms of ail the 
auto-correlation coefficients of candidate vectors C^n) and perceptually weighted impulse function H(n) gen- 

10 erated by cascade weighting filter 150. Energy calculator 535' comprises multiplier 543' to multiply the auto- 
correlation coefficients of the C^n) with the same delay term of the auto-correlation coefficients of H(n) (signals 
561' and 552') and accumulator 540' which sums the output of multiplier 543' to produce output 541' containing 
information on the energy of the candidate vector which is sent to divider 530'. Divider 530' performs the energy 
normalization which is used to set the gain. The energy of the candidate vector Cj^njis calculated very efficiently 

15 by summing ail the product terms of all the autocorrelation coefficients of candidate vectors C k (n) and percep- 
tually weighted impulse function H(n) of perceptually weighted short term filter 150. The above-described op- 
eration to determine the loop gain G k is described mathematically by equation (9) below. 



N n 

X C k (n)[£ X(m)H(n-m+l)] 

__ n = l m=l 

N 

U k (o)0(o) + 2£ [U k (n)0(n)] 



where 

C k (n), X(m), H(n) 0 k (n). U k (n) and N are as previously defined and G k is the loop gain for the k* code 

vector. 

30 Table 555' permits the computational burden to be further reduced. This is because auto-correlation coef- 

ficients 55? of the impulse function H(n) need be calculated only once per frame 117 of input speech. This can 
be done before the codebook search and the results stored in table 555'. The auto-coefficients 552' stored in 
table 555 before the codebook search are then used later to calculate the energy for each candidate vector 
from adaptive codebook 155. This provides a further significant savings in computation. 

35 The results of the normalized correlation of each vector in codebook 1 55 are compared in the peak selector 

570' and the vector C k (m) which has the maximum cross-correlation value Is identified by peak selector 570' 
as the optimum pitch period vector. The maximum cross-correlation can be expressed mathematically by equa- 
tion (10) below, 

40 

N n 
G k [X X(n)r£ C k (m)H(n-m+l)] 

n=l m=l (10) 

45 where G k is defined in equation (9) and m is a dummy integer index. 

The location of the pitch period, i.e., the index of code vector C k (m) is provided at output 221 'for transmittal 
to channel coder 210. 

The pitch gain is calculated using the selected pitch period candidate vector C k (m) by the gain calculator 
580' to generate the gain index 222'. 

so The means and method described herein substantially reduces the computational complexity without loss 

of speech quality. Because the computational complexity has been reduced, a vocoder using this arrangement 
can be implemented much more conveniently with a single digital signal processor (DSP), The means and meth- 
od of the present invention can also be applied to other areas such as speech recognition and voice identifi- 
cation, which use Minimum Squared Prediction Error (MPSE) search criteria. 

55 While the present invention has been described in terms of a perceptually weighted target speech signal 

X(n), sometimes called the target speech residual, produced by the method and apparatus described herein, 
the method of the present invention is not limited to the particular means and method used herein to obtain the 
perceptually weighted target speech X(n), but may be used with target speech obtained by other means and 
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methods and with or without perceptual weighting or removal of the filter ringing. 

As used herein the word "residual" as applied to "speech" or "target speech" is intended to include situations 
when the filter ringing signal has been subtracted from the speech or target speech. As used herein, the words 
"speech residual" or "target speech" or "target speech residual' and the abbreviation "X(n)" therefore, are in- 

s tended to include such variations. The same is also true of the impulse response function H(n), which can be 
finite or infinite impulse response function, and with or without perceptual weighting. As used herein the 
words "perceptually weighted impulse response function" or "filter impulse response" and the nota- 
tion "H(n)° therefore, are intended to include such variations. Similarly, the words "gain index" or "gain scaling 
factor" and the notation G k therefore, are intended to include the many forms which such "gain" or "energy" nor- 

10 malization signals take in connection with CELP coding of speech. 

Even with the advantages presented by the embodiment illustrated in FIG. 4, a significant computational 
burden still remains. For example, evaluation of the autocorrelation coefficients in block 560' of FIG. 4 (see 
equation (7)), requires (K)-(N!) multiplications in order to calculate the energy normalization (gain) coefficients 
for the K vectors in codebook 155. Since K is typically of the order of 512 or 1024 and N is typically of the order 

75 of 60 or 120 or 240, (K) (N!) = (K)-(N)-(N-1) (N-2) ... (2) is usually a very large number. These calculations are 
in addition to those required by the operations of blocks 510/, 520/, 550' and others needed to recursively de- 
termine the particular adaptive codebook vector C^n) and corresponding value of G^, as well as the best fit 
stochastic codebook vector and corresponding gain factor, which give the best fit (least error) of the target 
speech X(n) to the input speech. This requires a substantially amount of computational power to perform the 

20 necessary calculations in a reasonable time. 

It has been found that the number of autocorrelation operations required to be performed on a codebook 
having K vectors of N entries per vector can be substantially reduced without significant adverse impact on 
speech quality. This is accomplishes by the method comprising, autocorrelating the codebook vectors for a first 
P of N entries (P « N) to determine first autocorrelation values therefore, evaluating the K codebook vectors 

25 by producing synthetic speech using the K codebook vectors and the first autocorrelation values and comparing 
the result to the input speech, determining which S of K codebook vectors (S « K) provide synthetic speech 
having less error compared to the input speech than the K-S remaining vectors evaluated, autocorrelating the 
codebook vectors for those S of K vectors for R entries (P < R ^ N) in each codebook vector to provide second 
autocorrelation values therefore, re-evaluating the S of K vectors using the second autocorrelation values to 

30 identify which of the S codebook vectors provides the least error compared to the input speech, and forming 
the CELP code for the frame of speech using the identity of the codebook vector providing the least error. For 
K and N of the sizes described herein, P and S in the ranges of 5 ^ P ^ 10 and 1 ^ S < 7 are suitable. It is 
desirable that R = N or IM-1 . 

The above operations may also be described in terms of the equations and figures provided herein. For 

35 example, instead of recursively evaluating equation (7) for m=0 to N-1 for each n=1 to N, and for each value 
of k=1 to k=K, the following procedure is used: 

(1) Perform autocorrelation of codebook vectors Ck(n) in block 550' according to equation (7), for m=0 to 
m=P where P « Nl; 

(2) Using the P values of U k (P) found thereby, recursively evaluate all K vectors C k (n) and choose those 
40 S of K vectors C k (n), S « K, providing the closest match to the input speech; then 

(3) Recursively re-evaluate the S of K vectors chosen in step (2) above now using more than the initially 
chose P values, preferably all m=0 to m=N-1 values, for determining U k (m) in equation (7) to determine 
the value C^n) and corresponding gain index or factor G^j providing the best fit to the input speech; 
and 

45 (4) Send C^n) and G^j to channel coder 210, as before. 

As used herein, "recursively" is intended to refer to the repetitive analysis-by-synthesis codebook search 
and error minimization procedure described in connection with FIGS. 2A-B and 4. 

It has been found that output speech quality improves with increasing P up to about P = 10 with little further 
improvement for P > 10. Good speech quality is obtained for 5 ^ P ^ 10. Speech quality degrades rapidly for 
so P < 5. Since N is usually of the order of 60 or more, a significant computational saving is obtained. 

It has been found that useful speech quality results for values of S as small as S=1 , and that speech quality 
increases with increasing S. Beyond about S = 7, further improvement in speech quality becomes difficult to 
detect. Thus, 1 ^ S ^ 7 is a useful operating range which provides significant reduction in the number of com- 
putations that must be performed during the recursive search for the optimum codebook vectors and corre- 
55 sponding gain index or factor. This makes it still easier to accomplish the desired VOCODER function using a 
dingle digital signal processor. 

A further problem exists with respect to how the codebook entries are structured and the autocorrelation 
performed. This arises as a result of a procedure called "copy-up" that is frequently used in the prior art to fa- 
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cilitate identification of short pitch periods (e.g.. see Ketchum et at., supra). This is explained below. 

The energy term of the error function in an adaptive code book search for the optimum pitch period can be 
reduced to a linear combination of autocorrelation coefficients of two function (see Eqs. 7-9). These two func- 
tions are the impulse response function H(n) of the perceptually weighted short-time linear filter and the code- 

5 book vectors C k (n) of the adaptive code book. The computational complexity is greater for the adaptive code- 
book than the stochastic codebook because the autocorrelation coefficients for the adaptive code book vectors 
cannot be pre-computed and stored. 

Each adaptive codebook vector is a linear array of N entries, also referred to as samples or values. Each 
entry is identified by an index n running from 1 to N or from N to 1 . Adjacent vectors in the codebook differ from 

10 each other by one entry, that is, each successive vector has one new entry added at one end of the vector and 
one old entry dropped from the other end of the vector with the intervening entry remaining the same. Thus, 
except at the ends of the vector, adjacent vectors have identical entries displaced by one index number. If ad- 
jacent vectors are placed by side by side, they match up if displaced by one entry or sample. This is illustrated 
schematically below for hypothetical adjacent vectors k, k' having arbitrary entry values between 0 and 9 and 

15 indices n = 1 - 60. This displacement is referred to as the codebook "overlap". 

Example I - Vector Overlap Illustration 

k(n): 1,2,3,4,5,6,7, ,55,56,57,58,59,60 (index) 

20 

4,6,9,3,5,1,8, , 0, 4, 6, 8, 2, 3 (values) 

k'(n): 1,2,3,4,5,6,7, ,55,56,57,58,59,60 (index) 

6,9,3,5,1,8,5, 4, 6, 8, 2, 3, 7 (values) 

25 

It can be seen that the vector k' has the same entries as adjacent vector k displaced by one index, and that an 
old entry has been dropped from one end (e.g., the value 4 is dropped the left end) of the vector and a new 
entry added at the other end (e.g., the value 7 added at the right end). 

The autocorrelation function U k (m) is given by Eq. 7 where m = 0 to N-1 is the "lag" value in the products 

30 C k (n)*C k (n+m) and n = 1 to N is the index of the vector entries. Up to now it has been assumed that the vector 
length N (i.e., the number of entries per codebook vector) and the frame length L (i.e., the number of speech 
samples per analysis frame) are the same. But this is not always so. Different strategies are used for deter- 
mining autocorrelation coefficients depending on whether N and L are the same or different 

Where the vector length N is equal to or greater than a frame length L, the autocorrelation coefficients can 

35 be calculated by a process called add-delete end correction. For example, the zero order or zero delay (lag m 
= 0) autocorrelation coefficients of successive adaptive codebook vectors C kp C k ', C k ", etc., can be determined 
by calculating the sum of the (C k (n)) 2 for the first vector and finding the other vectors by end correction. End 
correction requires adding the square of the newly added vector value and subtracting off the square of the 
just deleted vector value. This same procedure can be followed (with some variations) for m = 1 , 2, 3, etc., with 

40 the result that the computational burden is reduced as compared to calculating each autocorrelation coefficient 
by evaluating Eq. 7 separately for each vector. This add-delete end correction process for determining auto- 
correlation coefficients is well known in the art 

Where the number of samples in the vector is less than a frame length L, it is common to "copy-up" the 
vector to fill out the frame (e.g., see Ketchum et al, supra). For example, if the frame length is 60 and only twenty 

45 entries are being used in the analysis, the 20 entries are repeated three times to obtain a vector length of sixty. 
This is illustrated below in terms of the indices of the vector values. 

Example II - Copy-up 

50 

Vector 1,2, ,59,60 

Copied-up vector 1,2,- ,19, 20, 1,2, *, 19, 20, 1,2, -.19, 20. 

55 This duplication or "copy-up" creates errors if one attempts to use the previously described add-delete end cor- 
rection method for calculating the autocorrelation coefficients. These errors degrade the quality of the synthe- 
sized speech. 

The end correction errors increase for larger values of m, i.e., the higher order (greater 'lag") terms in the 
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autocorrelation function. The simple add-delete end correction procedure described earlier no longer works sat- 
isfactorily on copied-up vectors. One is then left with the undesirable choice of accepting poorer speech quality 
in order to have a smaller computational burden (e.g., easy end correction) or having higher speech quality 
and a large computational burden (e.g., calculate each vector separately). It has been found that the compu- 

5 tational burden of obtaining the autocorrelation coefficients for the situation where the number of samples in 
the vector is less than a frame length can be reduced without loss of synthesized speech quality by an improved 
computational procedure and apparatus described below. 

Assume that the analysis frame has a length L (e.g., 60) and codebook vectors with N samples or values 
(e.g., 60) are to be used in connection the with apparatus and procedure of FIGS. 2-4 to determine the adaptive 

10 codebook vector producing the best match to the target speech. Further assume that in order to quickly detect 
short pitch periods, a smaller subset M < N of vector values (e.g., M —20) are initially used for the analysis. In 
the past the M samples or values would have been copied-up to fill out the frame of length L and the analysis 
based on the copied-up frame. With the invented method, it is not necessary to copy-up the sub-frame of M 
values. 

15 The description provided in connection with this embodiment is directed particularly to efficiently determin- 

ing the autocorrelation coefficients of the adaptive codebook vectors and reference should be had to the dis- 
cussion of FIGS. 2-4 for an explanation of the other portions of the analysis process used for choosing the co- 
debook vector having the smallest error and the best match to the target speech. 

Reference should also be had to Eq. 7 wherein the sum U k (n) over n=1 to N and m = 0 to N-1 of the product 

20 [C k (n)*C k (n+m)] is the autocorrelation coefficient of the k^ vector. The index m runs ordinarily from 0 to N-1 
and identifies the lag" used to calculate the autocorrelation coefficient. The index k running from 1 to K identifies 
the codebook vector and the index n denotes an individual sample or value within the vector. The number of 
samples used in the analysis depends upon the pitch period being detected. For example, about 20 samples 
are required for the shortest pitch periods associated with the human voice and about 147 for the longest pitch 

25 periods. 

The 0 th order autocorrelation coefficient corresponds to m = 0, the 1 st order coefficient to m = 1, and so 
forth. The "pitch lag" M < N is defined as the number of values in a vector that are to be used for the analysis. 
Thus, in determining the autocorrelation coefficients for short pitch period speech components, m varies from 
0 to M. The "frame size" L is defined as the number of samples of speech in the frame. Ordinarily, L = N. A 

30 typical value for L is 60 and a typical value for M is 20, but other values can be used for both provided that M 
< L For convenience of explanation, the values of L = 60 and M = 20 are assumed in the discussion that follows. 
However, those of skill in the art will understand based on the description herein that this is not intended to be 
limiting and that other values of M and L can also be used. 

The present invention provides a means and method for reducing the computational burden of determining 

35 the autocorrelation coefficients and avoiding the copy-up errors. It applies to the portion of the recursive analysis 
by synthesis procedure where copy-up was formerly used, that is, where a limited number of codebook samples 
(e.g., 20) are needed to quickly identify the shortest pitch periods, but where the limited number of samples 
must be expanded to the analysis frame length (e.g., 60) to avoid energy normalization problems. Once the 
first M+k-1 vectors have been analyzed and vector expansion is completed so that N = L, then the autocorre- 

40 lation coefficients are calculated by the add-delete end correction process discussed earlier. 
In a preferred embodiment, the method of the present invention comprises: 

(1) Determining the autocorrelation coefficient U k for the first vector k by evaluating Eq. 7 for m=0 to T < 
M and n=1 to M and multiply the result by L/M, where L, M, P, n, and m have the meanings described above. 
For L-60 and M=20, UM=3. The parameter T determines how many values of the autocorrelation lag m 

45 are used, i.e., how many autocorrelation coefficients are calculated. Typically, T = M-1 , but other smaller 

values of T may also be used. Using a smaller value of T is advantageous if the dominant values in the 
codebook vector are clustered so that the dominant autocorrelation coefficients are those for small values 
of m. 

(2) Determining the autocorrelation coefficient U k for the second vector k' by taking the sum of the products 
50 in Eq. 7 for each value of m previously obtained in step (1 ) and adding (C k '(n=M+1 )) 2 to the m=0 term, adding 

C k '(n=M+1)*C k '(n=M+2) to the m=1 term, add C lt '(n=M+1)*C k '(n=M+3) to the m=2 term, and so forth up to 
the T 01 term, and multiply the result by L/(M+1); 

(3) Determining the autocorrelation coefficient U k for the third vector k" by taking the sum of the products 
for each value of m previously obtained in step (2) and adding (C k '(n=M+2)) 2 to the m=0 term, add 

55 C k "(n=M + 2)*C k "(n=M+3) lo ^ e m=1 tenT|f add c k "(n=M+2)»C k "(n=M+4) to the m=2 term, and so forth up 

to the T h term, and multiply the result by L7(M+2); and 

(4) Determining the remaining autocorrelation coefficients for the remaining vectors by continuing as in (1)- 
(3) above, incrementing the values by one for each additional vector until L/(M+k-1) = 1. Thereafter, the 
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autocorrelation coefficients are calculated by the conventional prior art add -delete procedure described 
earlier. 

Stated another way. the autocorrelation coefficients of the codebook vectors are determined by calculating 
the coefficient U k (m) of the first vector k=1 using Eqs. 1 ia-b below, 

M 

U'i(m)=£ [dWC^n+m)] 

n=i (11a) 

Ui(m) - (j^U'-rfm) (11b) 

form = Oto T < M, and then calculating the autocorrelation coefficients U^m) of the remaining codebook vectors 
incrementally using Eqs. 12a-b below. 
is U' k (m) = [U' k . ,(m) + C k (M ♦ k - 1)C k (M + k - 1 + m)] (12a) 

Uk(m) = <M7kTT )U ' k(m) (12b) 

for m = 0 to T < M and for (M+k-1) ^ L The analysis by synthesis is performed using vectors (and their corre- 
sponding autocorrelation coefficients) of increasing length, starting with a vector of length M and increasing 

20 the length of each successive vector by one sample until the vector length equals the frame length, i.e., until 
(M+k-1) = L. The expansion of the short pitch sample to match the frame length is then complete. Subsequent 
vectors have the same length as the frame length and each successive vector of the overlapping codebook 
corresponds to deleting an old sample from one end and adding a new sample at the other end of the vector. 
The prior art add-delete end correction method is then used for determining the autocorrelation coefficients of 

25 the remaining vectors being analyzed. 

It will be noted that the sum of the products in Eq. 11a need be evaluated only once for the first vector and 
then other vectors up to (M+k-1) can be calculated from the terms of the first vector by adding the contribution 
of the C k *C k Products for the additional values or samples being included. No copy-up procedure is required 
and the errors in the autocorrelation coefficients created by copy-up do not arise. This substantially reduces 

30 the computational burden in the analysis by synthesis procedure described in connection with FIGS. 2-4. 

The difference between the prior art copy-up and the invented procedure is illustrated schematically below 
in terms of the vector indices. Calculation of the autocorrelation coefficients involves summing the products of 
the vector with itself for various amounts of lag m. i.e., relative displacement of the vector. The examples below 
show which values are multiplied together for various amounts of lag m for the copy-up approach and the in- 

35 vented approach. The numbers in the examples are the indices of the vector values or entries, not the values 
themselves, and may be thought of as a measure of the position of each entry along the vector 

Example III - Copy-up Autocorrelation 

40 For COPY-UP, multiply term by term and add, for each n and m, for example: 

For (k=1 , m=0), multiply 

1 ,2,3, 19,20,1 ,2,3, - ,19,20,1 ,2, 3,-, 19,20 by 
45 1 ,2, 3, ",19, 20,1 ,2,3, -J 9, 20,1 ,2,3, -,1 9,20; 

For (k=2, m=0), multiply 

1,2, --,19,20,21,1,2, • ,1 9,20,21, 1,2,- ,17,18 by 1,2, 
•,19,20,21,1,2, -,19,20,21,1,2,, ->17,18; 
For (k=3, m=0), multiply 

1 .2, -.1 9,20.21 .22.1 .2.-.20.21 .22.1 .2.-.15.16 by 
1.2,- . 19. 20, 21, 22. 1.2.-, 20, 21.22.1, 2. -.15, 16; 
55 and so forth for all k, m and n.. 
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Example IV - Improved Autocorrelation (m=0) 

For the invented arrangement, multiply and sum the first (e.g. 20) entries for m = 0 to M-1 and then add 
products of the n=M+1, n=M+2, etc., entries, for example: 

For (k=1, m=0), calculate 

1,2,3, ,19,20 times 
1 ,2, 3, •••,19,20 and multiplying by L/M; 
For (k=2, m=0) 

obtain 1 ,2,3, -,19,20,21 times 

1,2,3, - ,19,20,21 by adding 21 21 to 
the previous calculation for k-1. and 
multiplying by L/M+1 ; 

For (k=3, m=0) 

obtain 1 ,2,3, J 9,20,21 .22 times 

1,2, 3, -,19,20, 21, 22 by adding 22-22 

to the previous calculation for k=2, and 
multiplying by UM+2; and 

continuing for all m and until the vector length equals the 

frame length and the last term 60*60 is added, then proceed 

as in the prior art. 

While only the 0 th order term is illustrated in the above examples of the autocorrelation process for the prior 
art and invented approach, those of skill in the art will understand based on the description herein how to shift 
the vectors to represent the product terms for m=1 , m=2, etc. As an aid to that process, the following example 
is given for the present invention for k=1 , k=2 and m=1 : 

Example V - Improved Autocorrelation (m=1) 



For (k=1, m=1 ), calculate 

1,2,3, --,19,20 times 
1,2,-,18,19, and multiply by L/M+1; 
For (k=2, m=1) 

obtain 1 ,2, 3, -,19,20,21 times 

1,2,3, —,19,20 by adding 20-21 to 
the previous calculation for k=1 and 
multiplying by L/M+2; 

For (k=3, m=1) 

obtain 1 ,2,3,-, 1 9,20,21 ,22 times 

1,2,3, - ,19,20,21 by adding 21 22 

to the previous calculation for k=2, and 
multiplying by L/M+3; and 
and continuing for all k and m being evaluated up to L/(M+k- 
1) = 1. 
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An apparatus suitable for determining the autocorrelation coefficients in the manner described above ac- 
cording to a preferred embodiment of the present invention is illustrated in FIG. 5. Autocorrelation apparatus 
600 corresponding to the present invention comprises signal input 602 where vector samples C k (n) are received 
from adaptive codebook 155 of FIG. 4. Vector samples or values Cv(n) follow two paths 604, 606. Path 606 
5 passes via switch 608 to initial vector (i.e., k=1) autocorrelator 610. Initial vector autocorrelator 610 performs 
the functions indicated by Eq. 11a, that is, it calculates the autocorrelation coefficients U^m) corresponding to 
k= 1 , m=1 , 2, 3,..., T-1 , T. These autocorrelation coefficients are delivered via switch 620 to end correction coef- 
ficient calculator 622. 

First vector autocorrelation coefficient calculator 610 comprises registers 612 and 614 into which the first 

10 M (e.g., 20) samples in the codebook are loaded. Registers 612, 614 are conveniently well known serial-in/par- 
allel-out registers, but other arrangements well known in the art can also be used. 

The sample values are transferred to autocorrelator 616 which determines the sum of the products U^m) 
= SUMIC^nJC^n+m)] for m=0 (i.e., U,(0)) and clocks this coefficient out to block 622 through switch 620. Au- 
tocorrelator 616 then ahifts the samples in register 614 by one sample, via block 618, corresponding to m=1 

15 and calculates 1^(1), which is then clocked out to block 622. This procedure continues until all of the autocor- 
relation coefficients for initial vector C^n) have been determined and loaded into block 622. Switches 608 and 
620 then disconnects autocorrelation generator 610 from block 622. 

Block 622 performs the function described by Eqs. 11b and 12a-b. This is conveniently accomplished by 
the combination of register 624, multipliers 626, adders 628, register-accumulators 630, multiplier 632 and out- 

20 put buffer 634. Registers 624, 630 and buffer 634 conveniently have the same length as registers 612, 614 (as 
shown for example in FIG. 5), but may be longer or shorter depending on how many autocorrelation coefficients 
are desired to be evaluated and updated for subsequent vectors. For example, registers 624, 630 and buffer 
634 can be as large as the frame length. 

Register elements 630 contain the previously calculated autocorrelation coefficients to which end correc- 

25 tions are to be added to determine the autocorrelation coefficients for subsequent vectors. The end corrections 
are provided by register 624 in combination with multipliers 626. The end corrections from multipliers 626 are 
added to the previously calculated coefficients from register 630 in adders 628 and fed back to update register 
630 via loops 629. From register 630, the autocorrelation coefficients are transferred to multiplier 632 where 
they are scaled by the appropriate L7(M+k-1 ) factor and sent to output buffer 634 where they form, for example, 

30 output 561' in FIG. 4, wherein autocorrelation generator 600 describes element 560' in more detail for (M+k- 
1)SL. 

Describing the operation of block 622 in more detail, register 624 is loaded with the vector values at the 
same time as registers 612, 614. Register 630 is loaded with output U^m) of first vector autocorrelation coef- 
ficient generator 610 before autocorrelator 610 is disconnected from block 622. These initial autocorrelation 

35 coefficients are copied to multiplier 632 wherein they are multiplied by UhA and sent to buffer 634 from which 
they are extracted during the analysis by synthesis procedure described in connection with FIGS. 2-4. 

After register 630 has been loaded with the first T autocorrelation coefficient values, then an additional vec- 
tor value is clocked into register 624 and the vector value in each stage of register 624 is clocked out as shown 
by arrows 625. Assuming that the initial vector had M values, the most recent value now present in register 

40 624 is n=M+1 . This corresponds to vector k=2 since each vector differs from the previous vector by the addition 
of one entry until n = (M+k-1) = L. 

The new value n=M+1 is multiplied by itself in multiplier 6261 and the result delivered to adder 6281 where 
it is combined with the 0 th order U 1 (m=0) coefficient already stored in register element 6301. Register element 
6301 is then updated as indicated by arrow 6291 so that the sum of U t (0) + C k (M+1 )C k (M+1) is now present 

45 in register element 6301 and transferred to multiplier 632 where it is multiplied by L/(M+1 ) and loaded into buffer 
634, along with the other updated coefficient values from the other elements of register 630 which have been 
multiplied in 632 by the same factor. Counter 640 is provided to keep track of the number of codebook vector 
entries that have been loaded into register 624 and adjust the multiplication factor in multiplier 632 so that it 
corresponds to L7(M) for k=1, U(M+1) for k=2, L/(M+2), and so forth up to (M+k-1) = L. 

so Sample C k (M) from register 624 is multiplied by Ck(M+1) in multiplier 6262 and summed with IM1) from 

register element 6302 in adder 6282, which sum updates register element 6302 via connection 6292. The up- 
dated value is sent to multiplier 632 where it is multiplied by L/(M+1) and sent to buffer 634. The remaining 
samples in register 624 are processed in a like manner and then another sample, e.g., n=M+2, clocked into 
register 624 and the process repeated. In this fashion, the autocorrelation coefficients are available in buffer 

55 634 for each new vector formed by the addition of one more sample to the previous vector, in the same fashion 
as is illustrated in simplified form in Examples IV-V. 

While the temporary storage elements 612, 614, 624, 630, and 634 have been described as registers or 
buffers, those of skill in the art will understand based on the description herein that this is merely for convenience 
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of explanation and that other forms of data storage can also be used, as for example and not limited to, random 
accessible memory, content addressable memory, and so forth. In addition, such memory can have a wide var- 
ied of physical implementations, for example, flip-flops, registers, core and semiconductor memory elements. 
As used herein the terms "register" and 'buffer", whether singular or plural, are intended to include any modi- 

5 fiable information store of whatever kind or construction. Similarly, the other blocks identified, as for example, 
autocorrelator 616, indexer, 618, switches 608, 620, adders 628, multipliers 626 and/or counter 640, are in- 
tended to include equivalent functions of any form, whether separate elements or a combination of elements, 
or standard or application specific integrated circuits, or programmed general purpose processors able to per- 
form the described functions, separately or in combination. 

10 The present invention provides a rapid and simple method of determining the autocorrelation coefficients 

for a standard analysis frame length (e.g., 60) based on a shorter set of codebook vector samples (e.g., 20) 
which are needed to detect short pitch periods, without introducing the former copy-up errors involved in ex- 
panding the small number of codebook samples to the standard frame length- The computational burden is re- 
duced without sacrifice of speech quality because the end autocorrelation add-delete errors associated with 

15 the prior art copy-up arrangement are avoided. Copy-up is avoided entirely. 

While the invented apparatus for generating the autocorrelation coefficients has been described above in 
terms of hardware registers, autocorrelators, multipliers, adders, switches and the like, those of skill in the art 
will understand that these can be implemented in software so as to configure a computer to perform the same 
functions as have been described herein for the apparatus and to execute the method of the present invention 

20 based on the detailed description of the embodiments provided herein, and that such variations are contem- 
plated by the present invention. 

The above-described improvements significantly reduce the computational burden associated with deter- 
mining the optimum codebook vectors for replicating target speech, but further improvement is still desired. In 
particular, improvement is desired in the manner in which the optimum vector of the stochastic codebook is 

25 identified. 

In United States Patent number 4,797,925, Lin describes a procedure for reducing the computational bur- 
den of considering all the vectors in the stochastic codebook by use of overlapping stochasitc vectors. With 
Lin's arrangement, each successive vector in the codebook differs from the preceding vector by having an old 
value dropped from one end of the vector and a new value added at the other end of the vector. With this ar- 
30 rangement, the number of unique values in a codebook composed of 1024 vectors each having 60 values is 
reduced from 1024x60 = 61,440 to 60+1023 = 1,083. Even so, a large number of computations is still required 
to carry out the analysis and the steps are time consuming because they involve successive multiplication and 
addition. 

Stochastic codebook 180 (see FIG. 2B) contains K vectors S k (n) of length N, where k - 1 to K and n= 1 to 

35 N, and K is conveniently 512, 1024, 2048, etc., typically 1024, and N is conveniently 20, 40, 60, 120, etc., typ- 
ically 60. The indices k and n for stochastic codebook vectors S k (n) have the same interpretation as for adaptive 
codebook vectors C k (n), that is, k identifies which vector is being considered and n identifies the value being 
considered within vector k. It is convenient that index limits K and N for the vectors of stochastic codebook 1 80 
have the same magnitudes as index limits K and N for the vectors of adaptive codebook 155 r but this is not 

40 essential. Merely for convenience of explanation and not intending to be limiting, K and N are taken to have 
the same values for both codebooks, for example, K = 1024 and N = 60. 

The vectors in stochastic codebook 180 are conveniently a linear array of pseudo-random 0's and 1's or 
0's, 1 's and -1 *s. That is, each vector S k (n) is a string of N values, each value identified by index n. FIG. 6 shows 
an exemplary ternary (e.g., 0, 1, -1) stochastic codebook 180' analogous to codebook 180 but with K=8 and 

45 N-20. Persons of skill in the art will understand based on the description herein how the features of the code- 
book of FIG. 6 apply for larger values of K and N. Further, while FIG. 6 illustrates a ternary (e.g., 0, 1, -1) co- 
debook, a binary (e.g., 0, 1 or 0, -1) or other type of codebook may also be used. A ternary codebook is preferred. 

The vectors S k (n) in FIG. 6 for each successive value of k overlap by N-2. For example, vector S^n) differs 
from vector S k=1 (n) by having two old values dropped from the left end of vector S-,(n) and two new values added 

so at the right end of vector S A (n). Thus, the values of vector S 2 (n) are shifted two places to the left compared to 
vector Si(n) and there are two new values at the right end. Each succeeding vector differs from the previous 
vector in the same way. The choice of overlap amount, e.g., N-2 in FIG. 6, is convenient but not essential. Any 
value of overlap may be employed, e.g., 1 to N-1. Also, while the vectors have been described as being shifted 
to the left with new values being added at the right, the opposite convention may also be used, i.e., shift right 

55 and add new values at the left. 

The analysis procedure for identifying the optimal stochastic codebook vector is substantially the same as 
for the adaptive codebook vector, but with S k (n) substituted for C k (n), i.e., codebook 180 for codebook 155, 
and with the perceptually weighted short and long delay target speech signal Y(n) (see 176 of FIGS. 2A-B) 
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10 



15 



20 



25 



substituted for the perceptually weighted short delay target speech signal X(n) (see 151 of FIGS. 2A-B). Eqs 
V, 2', 5'and 6' below are analogous, respectively, to Eqs. 1, 2, 5, 6 presented earlier, but with the appropriate 
variables for the stochastic code book substituted for those previously described for the adaptive code book; 

n 

Z^(n) = X S k (m)H(n-m+l ), n = 1 ,. . .,N 

m=i (V) 



N 

I W(n) n=l....,N 

n=l (2') 



W*(n) = J Y(m)H(n-m+l), n = 1 - -,N 

m=l (5') 



N 

ZW(n)S k (n) n=l,--.,N 

"=i (6') 



A significant difference between the stochastic and adaptive codebooks is that the vectors making up sto- 
chastic codebook 1 80 do not change as a result of the analysis-by-synthesis process, as do those in codebook 
155, but are fixed. Thus, many of the computations represented by Eqs. 1 '-6' can be performed once per frame 
30 and the result stored and reused. For example, the autocorrelation of the stochastic codebook vectors need 
be performed only once since the result is invariant The autocorrelation coefficients are conveniently stored 
in a look-up table and need not be recomputed. This greatly simplifies the computational burden. 

It has been discovered that the process involved in determining which of the stochastic codebook vectors 
best represents the target speech can be substantially simplified and made more rapid by eliminating the mul- 
35 tiplication of the values of stochastic codebook vectors S k (n) by other signals nominally required by Eqs. V, 
2', 5', 6'. While the invented means and method is most usefully applied to the cross-correlation operations 
involving stochastic codebook vectors, it may also be applied to the convolution operations which involve sto- 
chastic codebook vectors. For convenience of explanation, the invented arrangement is described for the cor- 
relation operations, but those of skill in the art will understand based on the description herein how it may be 
40 applied to convolution operations. 

Cross-correlation is accomplished in a first embodiment by means of a mutiplexer-accumulator combina- 
tion where the select lines of the multiplexer are driven by the codebook or one or more replicas of the codebook. 
This is explained in more detail in connection with FIGS. 7-10. 

FIG. 7 is a simplified block diagram of stochastic codebook cross-correlator 700 according to the present 
45 invention. Correlator 700 is shown for the case of a ternary (e.g., 0, 1 , -1) codebook. Those of skill in the art 
will understand based on the description herein that the present invention applies to binary and other types of 
codebooks as well. The procedure described below can also be used to convolve the codebook vectors with 
other signals. 

Correlator 700 has input 701 where it receives signal or signals 702 to be cross-correlated with the code- 
so book vectors, as for example but not limited to, signal W'(n) from Eq. 5', or another signal to be correlated with 
the codebook vectors S k (n). Signals 702 received at input 701 are generally vectors having N values identified 
by an index, e.g., n or m running from 1 to N. For example, if Eq. 6' is being evaluated, then W'(n) is presented 
at input 701. If Eq. V is being evaluated, then H(n-m+1) is presented input at 701. While the invented arrange- 
ment is particularly useful in connection with speech VOCODERS, it may be used in connection with any signal 
55 or string of similar form. 

For convenience of explanation, the means and method of the present invention are described for evalua- 
tion of Eq. 6', but those of skill in the art will understand based on the description herein that it applies to any 
other sum of the products of two vectors or vector arrays where one vector or vector array has fixed values, 
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as for example but not limited to 1,0 or -1,0 or -1,0,1, while the other may be variable. The evaluation of Eq. 6' 
produces a single cross- correlation value Q(k) for each value of index k, that is: 

N 

5 Q(k) = £ W'(n)S k (n) n =l, . .,N 

n=l (6") 

Vector signal 702 (e.g., W'(rt)) supplied to input 701 is transferred to multiplexers 704, 705. Multiplexers 

10 704 is illustrated in more detail in FIG. 8 and multiplexer 705 is substantially identical. Coupled to multiplexer 
704 is memory 706, as for example, a ROM or EPROM having non-zero entries corresponding to the 1's in 
code book 180. Other type of memory may also be used, but non-volitile memory is most convenient FIG. 9 
illustrates the content of memory 706' analogous to memory 706 but with K=9 and N=20, and corresponding 
to the content of codebook 180' of FIG. 6. The indices k and n have the same function in connection with memory 

15 706 (and memory 707) as in codebook 180, i.e., k identifying vectors or other data strings corresponding to 
vectors and n identifying values within the vectors or strings. Memory 706, 706' has 0's everywhere except 
where a 1 appears in codebook 180, 180' (compare FIGS. 6 and 9). The output of memory 706 is coupled to 
select lines 708 of multiplexer 704 so that each value k, n controls a particular select line n acting on the value 
of the vector being provided at input 701 . 

20 Coupled to multiplexer 705 is memory 707 which is analogous to memory 706 but having non-zero entries 

corresponding to the-1's in codebook 180. FIG. 10 illustrates the content of memory 707' analogous to memory 
707 but with K=8 and M=20, and corresponding to codebook 180' of FIG. 6. Memory 707, 707' has 1*s every- 
where a -1 appeared in codebook 1 80, 1 80' and 0's otherwise (compare FIGS. 6 and 10). The output of memory 
707 is coupled to select lines 709 of multiplexer 705 so that each value k, n controls a particular select line n 

25 acting on the value of the vector being presented at input 701 . 

Memories 706, 707 are controlled by address sequencer 714. As the signal vector 702 is presented at input 
701 to be correlated with the first (i.e., k=1) codebook vector, sequencer 714 accesses the k=1 data set of mem- 
ories 706, 707 and transfers values n=1 to n=N for k=1 to corresponding multiplexers 704, 705 on select lines 
708, 709. The values appearing on select lines 708, 709 cause multiplexers 704, 705 to pass the appropriate 

30 values of input vector 702 to accumulators 712, 713 where they are summed to produce outputs 716, 717. Out- 
puts 716, 717 are combined in combiner 720 to provide the first cross-correlation, i.e., Q(1), at output 721. 

Sequencer 714 then selects the k=2 values in memories 706, 707 and transfers the n=1 to N values therein 
for k=2 to select lines 708, 709 of multiplexers 704, 705, and so forth to produce the second cross-correlation, 
i.e., Q(2), output 721. This process is repeated until input vector signal 702 for a speech frame has been cor- 

35 related with the codebook vectors represented by the entries in memories 706, 707 to obtain cross-correlation 
values Q(1 ),..., Q(K). The stochastic vector of index k=j having a larger value of Q(k=j) generally gives a better 
representation of speech than another vector k=i having a smaller value of Q(k=i). 

While the use of two memories 706, 707 is convenient for a ternary codebook, more or fewer may be used 
according to the type of coding used in codebook 1 80. For example, only one memory need be used for a binary 

40 codebook, and the codebook itself can suffice as the memory if it is able to deliver the 0, 1 values corresponding 
to n=1 to N to the multiplexer select lines for each index k. Thus, in the case of a binary codebook or equivalent, 
a separate memory may not be required and the codebook itself can be used to supply signals to the select 
lines of the multiplexer. 

Referring now to FIG. 8, the operation of multiplexer 704 is described. The construction and operation of 
45 multiplexer 705 is similar. Multiplexer 704 is generally an N by N multiplexer having N gates 715, denoted by 
G1 f ...,GN. One input to each of gates 715 is connected to input 701 to receive a particular value (identified by 
index n) of an input signal vector 702, and another input 703 is tied to the system logical 0 reference level, e.g., 
ground. Gates 715 couple output 710 to either input 701 (i.e., signal 702) or input 703 (i.e., "zero"), as deter- 
mined by the logical signal present on select lines 708. For the arrangement shown, a value of 1 on, for example, 
so line n=i of select lines 708 causes the n=i value of input vector 702 (appearing on the n=i line of input 701) to 
be transferred to the n=i line of output 710, otherwise a value of 0 is transferred. Any equivalent logic arrange- 
ment having an analogous result will also serve. 

Multiplexer 704 is capable of receiving N input signal values 702 on input 701 and N select values on select 
lines 708 and transferring up to N values from input signal 702 to outputs 710 according to whether select lines 
55 708 driven by memory 706 are set to 0 or 1. The operation of multiplexer 705 is similar with respect to inputs 
702, select lines 709 driven by memory 707 and outputs 711, except that multiplexer 705 passes the values 
of input vector signal 702 at input 701 to output 711 for indices k, n where the codebook vector value is -1 while 
multiplexer 704 passes the input vector values 702 to output 710 for indices k, n where the codebook vector 
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value is +1. 

Outputs 710 and 71 1 are coupled to accumulators 712, 713 respectively, wherein the input vector signal 
values 702 transferred through multiplexers 704, 705 are added together to produce outputs 716, 717 corre- 
sponding to the Q + (k) and Q"(k) correlation values, respectively. Outputs 716, 717 are combined in combiner 
720 to produce correlation output values Q(k) at 721. Where codebook 180 is a ternary codebook, as in this 
example, then combiner 720 takes the difference of outputs 716, 717 from accumulators 712, 713 to produce 
output 721 , i.e., Q(k) = CT(k) - CT(k). This takes into account that the operations performed by multiplexer 705, 
memory 707 and accumulator 71 3 correspond to the -1 values of codebook 180, e.g., see FIG. 10. While com- 
biner 720 subtracts in this particular implementation, those of skill in the art will understand based on the de- 
scription herein that the same result could be obtained by many other means. For example, and not intended 
to be limiting, the same output 721 is obtained by inverting the output of multiplexer 705 or accumulator 713 
and making combiner 720 an adder. 

Correlation generator 700 of FIG. 7 corresponds, for example, to correlation generators 520 or 520' of FIGS. 
3-4 and output 721 of correlation generator 700 corresponds to output 521 of FIG. 3 or output 551' of FIG. 4 
but for stochastic codebook vectors S k (n) rather than adaptive codebook vectors C k (n) and for target speech 
signal Y(n) rather than X(n), depending upon what particular input signal vector is being processed. 

A further embodiment of the present invention will now be described in connection with Eq. 6" and FIGS. 
6, 9, 10. Applying Eq. 6" to codebook 180' of FIG. 6 yields the correlation values Q(1) through Q(8) for values 
of W'(n) where n=1 to 20, as shown in Table I: 



Table I 



Q(1) « +WX04)-W'(05)"WX09)+WX14)-WX18)+W , (19) 

Q(2) = +W , (02)-W , (03)-W , (07)+W , (12)-W , (16)+W , (17) 

Q(3) = -W , (01)-W , (05)+W , (10).WX14)+W t (15)+W , (20) 

Q(4) = -W^03)+W # (08)-W'(12)+W , (13)+W , (18)-W , (19) 

Q(5) = -W , (01)+W , (06)-W , (10)+WX11)+W , (16)-W , (17) 

Q(6) = +W , (04)-W , (08)+W'(09)+W t (14)-W , (1 5)-W(19) 

Q(7) = 4-W , (02)-W(06)+W , (07)+W , (1 2)-W'(13)-W'(1 7) 

Q(8) = -W , (04)+WX05)+W , (10)-W , (11)-W , (15)+W , (20) 



The array of Table I may be rearranged to group the terms which correspond to the +1 codebook values and 
the terms which correspond to the -1 codebook values so as to express the correlation values as Q(k) = [Q + (k)J 
- [Q"(k)] ( as shown in Table II: 



Tabls II 



Q(1) = [W'(04)+W(14)+W(19)] - IWfOSJ+WCO^+W'OS)] 

Q(2) = [W'(02)+W'(12)+W , (17)] - [W(03)+W , (07)+W'(1 6)] 

Q(3) = [W'OOJ+W'OSJ+W^O)] - [W , (01)+W , (05)+W'(14)] 

Q(4) = [W , (08)+W , (13)+W , (18)] - [W , (03)+W(12)+W(19)] 

Q(5) = W(06)+W(M)+W06)) - [W(01)+W(10)+yN'{-\7)] 

Q(6) = [W(04)+W(09)+W(14)] - [W(08)+W(15)+W(19)] 

Q(7) = [W(02)+W(07)+W(12)] - [W'(06)+W(1 3)+W(1 7)] 

Q(8) = [W , (05)+W(10)+W(20)] - [W(04)+W(1 1 )+W(1 5)] 



The values shown in the left-most brackets of Table II correspond to the input to accumulator 71 2 and the values 
in the right-most brackets of Table II correspond to the input to accumulator 713. 
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Referring to codebook 180' of FIG. 6, it is apparent that the codebook is sparsely populated, i.e., most of 
the entries are O's. FuHher, referring to Tables I and II, it is apparent that the overlapping nature of the suc- 
cessive vectors is reflected in the indices of the values of W'(n) being summed to obtain the correlation values 
Q(k). Accordingly, the codebook structure lends itself to more economical ways of generating the sums indi- 
5 cated in Tables I and II. These are described below. 

Rather than store all of the codebook values, one can store only the indices (i.e., the values of n) of the 
non-zero entries for each value of k. This is most conveniently accomplished separately for the CT(k) and the 
Q"(k) values, but that is not essential. The correlation values Q + (k) and Q~(k) for each value of k are obtained 
merely by summing the W'(n) values corresponding to the stored values of n for each value of k, i.e., executing 
10 the sums shown in Tables I or II. 

The computational and/or the address storage requirements can be further reduced and speedier operation 
obtained by using a recursive computational method that takes into account the overlapping nature of the co- 
debook entries. With this approach, which is preferred, one stores the index values n of the codebook entries 
for k=1 and calculates the indices n for vectors k=2, k=3, etc., from the index values of k=1 based on the co- 
15 debook overlap. The indices of any new codebook entries added at the ends of each vector are also taken into 
account. 

For example, in the case of the +1 entries in FIG. 9 and the Q + (k) portion of Table II (i.e., left-most bracketed 
quantities), one stores n = 4, 14, 19 and the codebook overlap, in this case N-2 (i.e., Ak = +1, An = -2) and 
calculates the contribution to the Q(k)'s that come from the corresponding W'(n) values, as follows: 
20 The k=1 ,n=4 index is evaluated first and contributes to the Q + (1) and Q + (2) values the terms: 

Table III 

25 Q + (1) = W'(04) 

Q+(2) = W(02). 

The Q(2) term W'(02) for index k=2,n=2 is determined by applying the codebook overlap (Ak = +1, An = -2) to 
30 the first index k=1 , n=4.. 

The k=1, n=14 index is evaluated next and contributes additional terms W'(14), W'(12), W'(10), W'(08), 
W'(06), W'(04), W'(04) and W'<02). All the terms except W'(14) are determined by applying the codebook over- 
lap to the starting index k=1, n=14. The result is as follows: 

35 Table IV 



Q + (D 


= W(04)+W(14) 


Q + (2) 


= W(02)+W(12) 


Q-(3) 


= W(10) 


Q+(4) 


= W(08) 


Q+(5) 


= W(06) 


Q + (6) 


= W(04) 


Q+(7) 


= W*(02). 



The k=1, n=19 index is evaluated next and contributes additional terms W'(19), W'(17), W'(15), W'(13), 
so W'(1 1 ), W'(09), W'(07), W'(05), W'(03) and W'(01 ). All terms except the W'(1 9) are found by applying the co- 
debook overlap to the starting index k=1 , n=1 9. The result is as follows, where the sequence has been extended 
for vectors k>8 to show how the contribution continues for higher vector numbers: 



55 
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Tabla V 



Q+(1) = W\04)+W'(14)+W , (19) 
Q+(2) = W(02)+W , (12)+W(17) 
Q+(3) = W(10)+W(15) 
Q+(4) = W(08)+W(13) 
Q+(5) = W(06)+W(11) 
Q+(6) = W(04)+W(09) 
Q+(7) = W(02)+W(07) 
Q+(8) = W(05) 
Q+(9) = W(03) 
Q+(10)= W(01). 



20 This exhausts the indices for k=1 and all of the values that can be determined therefrom based on the codebook 
overlap. No additional non-zero values appear at the ends of vectors k=1 , k=2, so correlation values Q + (1 ). Q*(2) 
are now complete. 

The next index to be included is k=3,n=20 and contributes additional terms W'(20), W'(18), W'(16). W'(14), 
W'(12), W'(10), W'(08), W'(06), W'{04) and W'(02). Again, terms except W'(20) are identified by applying the 
29 codebook overlap to the starting index k-3, n=20. The result is as follows, where the sequence has been ex- 
tended for vectors k>10 to show how the contribution continues for higher vector numbers: 

Table VI 

30 

Q+(1) = W(04)+W'(14)+W'(19) 
Q+(2) = W(02)+W(12)+W'(17) 
Q+(3) = W(10)+W , (15)+W'(20) 
35 Q + (4) = \N'(08)+W^3)+WC\S) 

Q+(5) = \N'{06)+W(M)+W(16) 
Q + (6) = W(04)+W(09)+W'(14) 
Q+(7) = W'(02)+W , (07)+W(12) 
Q+(8) = W(05)+W(10) 
Q + (9) = W(03)+W(08) 
Q+(10)= W(01)+W(06) 
45 Q+(12)= W(04) 

Q+(13)= W(02). 



40 



This exhausts the indices for k=1 through k=7 and all of the values that can be determined therefrom based 
so on the codebook overlap. No additional non-zero values appear at the ends of vectors k=1 through k=7, so 
correlation values Q + (1) through Q + (7) are now complete. 

The above-described process continues until the non-zero entries in the codebook have been exhausted 
and all Q + (k) correlation values have been determined. The process used for the Q~(k) values is substantially 
identical. Separating the ternary codebook into separate portions for calculating Q + (k) and Q~(k) avoids having 
55 to account for the sign of the individual entries during the above-described process for calculating the Q(k) cor- 
relation values taking advantage of the codebook overlap, but that is not precluded. Q(k) is found by the dif- 
ference Q(k) = CT(k) - Q"(k). The vector of index k=j having the largest correlation value Q(j) is identified by 
comparing the Q(k) values for k=1 to k=K (or for at least some sub-set thereof), using means well known in the 
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art The correlation values determined above are used in connection with other information in the analysis-by- 
synthesis process previously described to identify the optimal stochastic codebook vector, that is, the stochastic 
code book vector which, when used to synthesize speech, provides the least error compared to the input target 
speech. This optimal stochastic code book vector from code book 180 is then used in part to construct the VO- 

5 CODE being transmitted which is eventually used to again reproduce the input speech in the receiver. 

Stated more generally, the above described process coding of speech using a combination of a first vector 
V(n) having values identified by index n running from n=1 to N, and a set of the second vectors S k (n) wherein 
each of the second vectors is identified by index k and wherein each of the second vectors has up to N values 
which are either zero or non-zero and are identified by index n from n=1 to N, comprises, identifying indices 

10 n kJ of S k (n) for different k wherein S k (n ( ) are non-zero, adding values of the V(n) corresponding to indices n^ 
to form sums Q(k), identifying the value k=j corresponding to the largest value Q(k=j), and synthesizing. speech 
using S^n). 

It is further desirable that successive vectors of the set of second vectors are determined by overlap of the 
preceding second vector according to an overlap amount Ak,An, wherein the identifying and adding steps com- 

15 prise, identifying for k=1 indices n 1r , of S k (n) wherein S^n^ are non-zero, starting from n 1t i and using the overlap 
amount Ak,An, determining further indices T\ k V for k>1 wherein S k (n r ) are non-zero, and adding values of the 
V(n) for such indices and further indices to form sums Q(k). It is further convenient to identify for k^2, a first 
index n Kr not previously identified wherein S k (n r ) is non-zero, and then, starting from index n^r determining 
still further indices n Kr for k^3 wherein S k (n r ) are non-zero using the overlap amount, and adding values of 

20 V(n) for such still further indices to further form sums Q(k). 

While the foregoing method may be practiced on a general purpose computer, it should be programed to 
provide, a means for identifying indices n k j of S^n) for different k wherein S k (n ( ) are non-zero, a means for add- 
ing values of the V(n) corresponding to indices n Ki , to form sums Q(k), a means for identifying the value k=j 
corresponding to the largest value Q(k=j), and a means for synthesizing. speech using S^n). Those of skill in 

25 the art will understand how to do this. 

It is further desirable that it be programmed to provide a means for identifying for k=1 indices n 1t , of S k (n) 
wherein S^ni) are non-zero, a means for determining further indices n^r for k>1 wherein S k (n P ) are non-zero, 
starting from n 1t) and using the overlap amount Ak, An, and ameans for adding values of the V(n) for such indices 
and further indices to form sums Q(k). It is fuHher desirable that the means for identifying, determining and 

30 adding comprise, a means for identifying for fe2, a first index n^r not previously identified wherein S k (n r ) is 
non-zero, a means for determining still further indices n^. for k^3 wherein S k (n r .) are non-zero starting from 
index n k , r and using the overlap amount, and a means for adding values of V(n) for such still further indices to 
further form sums Q(k). 

The above-described means and method of providing the equivalent of the sum of the products of a first 
35 vector having n=1 to N values by a set of k=1 to K second vectors having n=1 to N values by taking advantage 
of the sparse non-zero values of the codebook and the overlapping nature of the codebook vectors results in 
substantially reduced computational burden compared to the prior art and may be accomplished more quickly 
and with substantially less computational resources than required by prior art. The above- described process 
is conveniently accomplished on a general purpose computer or a special purpose computer, programmed to 
40 execute the procedures described herein and illustrated in Tables I- VI. Persons of skill in the art will understand 
based on the description herein and using means well known in the art, how to program a computer to accom- 
plish the above-described steps. 

It will be apparent to those of skill in the art based on the description herein that the above-described means 
and method produces the same effect as the multiplication steps normally required for the cross-correlation 
45 process associated with determining which of the stochastic codebook vectors provides the best match with 
the target speech. By eliminating the multiply operation, the correlation operation is made faster and the re- 
quired number of manipulations of the vector values is reduced. These benefits are highly advantageous. 

Finally, the above-described embodiments of the invention are intended to be illustrative only. Numerous 
alternative embodiments may be devised without departing from the spirit and scope of the following claims. 

50 

Claims 

1. An apparatus (220, 220') for coding a frame of speech comprising N successive samples n=1 to n=N of 
55 input analog speech, to determine an optimum codebook vector C^n) which best synthesizes the speech 

frame, comprising: 

an adaptive codebook (155) containing K possible perceptually weighted excitation vectors C k (n), 
where k is an integer index running from 1 to K for identifying the vectors, and n=1 to n=N is the integer 
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index identifying the successive speech samples within the frame of speech; 

one or more LPC filters (190, 195, 200, 205, 170, 145, 175, 150) for synthesizing trial replicas of 
the frame of speech when excited by the codebook vectors C k (n) , wherein the LPC filter (190, 195, 200, 
205, 170, 145, 175, 150) has an impulse response H(n); 
5 means (150) for generating a perceptually weighted target speech residual X(n) for comparison to 

the results of exciting the one or more LPC filters (190, 195, 200, 205, 170, 145, 175, 150) with the code- 
book vectors C k (n) to determine the optimum codebook vector C^n) which best synthesizes the speech 
frame; 

means (510') for convolving X(n) with H(n) for each value of n once per frame to produce a con- 
10 volved output (512') for delivery to a cross-correlator (520'); 

means (520') for cross-correlating the convolved output (512') with C^n) for each value of n and k 
to produce a cross-correlated output (551') for delivery to a squarer (525') whose output (521') is coupled 
to a divider means (530'); 

means (560') for auto-correlating C k (n) for each vaiue of n and k to provide a first auto-correlation 
15 output (56V) for delivery to a multiplier means (543'); 

means (550*) for auto-correlating H(n) to produce a second auto-correiated output (552', 556') for 
delivery to the multiplier means (543'); 

means (543') for multiplying the first (561') and second (552') autocorrelation outputs to produce 
an output product (545') for delivery to an adder (540'); 
20 means (540') for summing the product for each value of k to produce a summed output (541') for 

delivery to the divider (530'); 

divider means (530') for finding the ratio of the squared cross-correlation output (521 ') and the adder 
summed output (541') for delivery to a selector means (570') ; and 

means (570') for selecting that value C^n) of C k (n) which produces the greatest magnitude of out- 
25 put from the divider means (530') t for delivery to a channel coder (210). 

2. A method for coding a frame of speech comprising N successive samples of input analog speech and using 
an adaptive codebook (155) containing K target perceptually weighted excitation vectors C k (n), where k 
is an integer index running from 1 to K, and n is another integer index identifying successive speech sam- 

30 pies n=1 n=N within the frame of speech, to determine an optimum codebook vector C^n) which best 

synthesizes the speech frame, comprising; 

providing one or more LPC filters (1 90, 1 95, 200, 205, 1 70, 1 45, 1 75, 1 50) for synthesizing trial rep- 
licas of the frame of speech when excited by the codebook C k (n) vectors, wherein the one or more LPC 
filter (190, 195, 200, 205, 170, 145, 175, 150) has an impulse response H(n); 
35 providing a perceptually weighted target speech residual X(n) for comparison to the results of ex- 

citing the one or more LPC filters (190, 195,200, 205,170, 145, 175, 150) with the codebook vectors C k (n) ; 

convolving X(n) with H(n) for each value of n once per frame to produce a convolved output W(n) 
for delivery to a cross-correlator (520'); 

cross-correlating the convolved output W(n) with C k (n) for each value of n and k to produce a cross- 
40 correlated output for delivery to a squarer (525') whose output is coupled to a divider (530/); 

auto-correlating C k (n) for each value of n and k to provide a first auto-correlation output U k (m) where 
m is a dummy index running from m=0 to m=N-1 for delivery to a multiplier (543'); 

auto-correlating H(n) to produce a second auto-correlated output 0(m) where m is a dummy index 
running from m=0 to m=N-1 for delivery to the multiplier (543'); 
45 multiplying the first and second autocorrelation outputs in the multiplier (543') to produce an output 

product (545') for delivery to a summer (540'); 

summing the product for each value of k to produce a summed output (541') for delivery to the div- 
ider (530'); 

dividing to obtain the ratio of the squared cross-correlation output (521') and the adder summed 
so output (541') for delivery to a peak selector (570/); and 

selecting in the peak selector that value C^j(n) of C k (n) which produces the greatest magnitude of 
output from the divider (530'), for delivery to a channel coder (210). 

3. A method for providing CELP coding for a frame of digitized input speech based on use of a codebook 
55 (155) containing K vectors each having N entries, comprising: 

autocorrelating the codebook vectors for a first P of N entries (P « N) to determine first autocor- 
relation values (561') therefore; 

evaluating the K codebook vectors by producing synthetic speech using the K codebook vectors 
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and the first autocorrelation values and comparing the result to the input speech; 

determining which S of K codebook vectors (S « K) provide synthetic speech having less error 
compared to the input speech than the K-S remaining vectors evaluated; 

autocorrelating the codebook vectors for those S of K vectors for R entries (P < R ^ N) in each 
codebook vector to provide second autocorrelation values therefore; 

re-evaluating the S of K vectors using the second autocorrelation values to identify which of the S 
codebook vectors provides the least error compared to the input speech; and 

forming the CELP code for the frame of speech using the identity of the codebook vector providing 
the least error. 

The method of claim 3 wherein 5 ^ P ^ 10, 1 ^ S ^ 7, and R = N or N-1. 

The method of claim 3 wherein the frame of digitized input speech denoted as X(n) comprises n=1 to n=N 
successive samples of input analog speech and the codebook is an adaptive codebook (155) containing 
K target perceptually weighted excitation vectors C k (n), where k is an integer index running from 1 to K, 
and n is another integer index identifying successive speech samples n*1 to n=N within the frame of 
speech, and wherein C^n) denotes an optimum codebook vector which best synthesizes tne target 
speech frame X(n), and wherein: 

the first autocorrelating step comprises, autocorrelating codebook vectors C k (n) according to the 
equation 

N 

U k (m) = X [C k (n)C k (m+n)], m=0 N-1 

n=l 

for m=0 to m=P where P « IM; and wherein, 

the evaluating step comprises, recursively evaluating in a codebook searcher, all K vectors C k (n) 
using the P values of U k (P) found from the equation to determine the mean square error probability; and 
wherein, 

the determining step comprises, choosing those S of K vectors C k (n), where S « K, providing the 
closest match to the target speech X(n); and wherein, 

the second autocorrelating and re-evaluating steps comprise, recursively re-evaluating in a code- 
book searcher the S of K vectors chosen above now using all m=0 to m=N-1 values for determining U K (m) 
in the equation, thereby selecting the j m value C^n) and a corresponding gain index G^j providing the 
best fit to the target speech X(n); and wherein 

the forming step comprises, sending C^n) and G^j to a channel coder for transmission to a CELP 
synthesizer. 

An apparatus (100, 220, 220', 600) for CELP coding of speech employing autocorrelation coefficients of 
vectors of an adaptive codebook (155) wherein analysis initially utilizes a subset of samples M in connec- 
tion with a speech analysis frame of length L > M, comprising: 

means (610) for determining autocorrelation coefficients U k -(m) of a first vector C k (n) of length M, 
where k = 1 and m is an autocorrelation lag index and n is an index of successive samples in the codebook 
vector, according to a first equation, 

M 

U'i(m)=£ [Ci(n)Q(n+m)] 

n=l 

for m = 0 to T < M; 

means (624, 626, 628, 630) for determining autocorrelation coefficients U k (m) of remaining code- 
book vectors incrementally where k ^ 2 according to a second equation, 

U' k (m) = [U' k . !(m) + C k (M + k - 1)C K (M + k - 1 + m)] 
for m = 0 to T < M until (M+k-1)=L; 

means (632) for scaling the result of the first equation according to 

Ui(m) = (^U'rfm). 
and scaling the result of the second equation according to 
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for m = 0 to T < M to produce a result for each m and each k; and 

means (220, 220') for using the result to evaluate which codebook vector provides a least error com- 
pared to input speech. 

A method for CELP coding speech employing autocorrelation coefficients of vectors of an adaptive code- 
book (155) of vector length N wherein analysis initially utilizes a subset of samples M < N with a speech 
analysis frame of length L, comprising: 

calculating autocorrelation coefficients U k (m) of a first vector C^n) of length M, where k = 1 and m 
is an autocorrelation lag index and n is an index of the successive samples in the codebook vector, ac- 
cording to, 

M 

U,(m) = X [Ci(n)Ci(n+m)] 

n=l 

Ut(m) = (^)U' t (m) 

for m = 0 to T < M; 

calculating the autocorrelation coefficients U k (m) of the remaining codebook vectors incrementally 
where k ^ 2 according to, 

U\(m) = [U' k . Urn) + C k (M + k - 1)C k (M + k - 1 + m)] 

U k (m) = ( M+ ^_ 1 )l/ k (m) 

for m = 0 to T < M; 

repeating the second calculating step until (M+k-1) = L; and 

using the above-determined autocorrelation coefficients in determining which of the codebook vec- 
tors C k (n) produces the least error when compared to input speech. 

The method of claims 6 and 7 wherein T = M-1. 

A method for CELP coding speech employing autocorrelation coefficients of vectors of an adaptive code- 
book (155) identified by an index k, wherein analysis by synthesis initially utilizes M < L codebook values 
where L is the speech analysis frame length and m is an index running from 0 to M-1 describing the au- 
tocorrelation lag, comprising: 

calculating m=0 to M-1 autocorrelation coefficients of a first codebook vector k having n = 1 to M 
values therein where n is an index of the code vector values; 

placing the m=0 to M-1 calculated autocorrelation coefficients in a temporary store (630); 

scaling the coefficients in the temporary store by a multiplying factor L/M and transferring the result 
to an output (634); 

multiplying codebook values for n=M+j where j=1 by codebook values for n = M+j down to n = 1 
and adding the products to the m=0 to M-1 autocorrelation coefficients, respectively, from the temporary 
store (630) to produce a result; 

replacing the autocorrelation coefficients in the temporary store (630) by the result; 

scaling the coefficients in the temporary store (630) by a multiplying factor L/(M+j) and transferring 
the result to the output (634); 

repeating the multiplying, replacing, scaling and transferring steps for j=2 to j=k-1 and k=(L+1-M); 

and 

using the autocorrelation coefficients transferred to the output to determining which of the codebook 
vectors provides better CELP coding of speech. 

An apparatus (100, 700) for CELP coding speech by combining a first vector with a set of second vectors 
identified by an index k, wherein the first and second vectors have values identified by indices n running 
from n=1 to N, comprising: 

a first N by N multiplexer (704) having n=1 to N outputs, n=1 to N first inputs, a second input, and 
n=1 to N select means, wherein a first logic level presented to n^ select means couples the n" 1 output to 
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the n m first input and a second logic level presented to the n^ select means couples the n* output to the 
second input; 

first means for supplying n=1 to N values of the first vector to the n=1 to N first inputs (701) of the 
first multiplexer (704); 

means (706) for presenting n=1 to N values of the second vector of index k=1 to the n=1 to N select 
means (708) of the first multiplexer (704), the second vector providing at the n=1 to N select means the 
first logic level for some values of n and the second logic level for other values of n; 

first accumulator means (712) coupled to the first multiplexer output (710) for adding together values 
of the first vector transferred to the outputs (71 0) of the first multiplexer (704) to provide a first sum (716); 

means (714) for indexing k from k=1 to k=K; and 

means (228) for synthesizing speech based on whichever sum identifies a second vector giving 
the closest match to target speech. 

The apparatui of claim 10 wherein the second vectors have vaJues 0, +1, -1, and wherein the means for 
presenting the n-1 to N values of the second vector comprises two portions, a first portion (706) having 
entries 0, 1 corresponding to the locations of values of 0, +1 of the second vectors and a second portion 
(707) having values 0, 1 corresponding to the locations of values 0, -1 of the second vectors, and wherein 
the presenting means presents the first portion thereof to the select means (708) of the first multiplexer 
(704), and wherein the apparatus further comprises: 

a second N by N multiplexer (705) having n=1 to N outputs, n=1 to N first inputs, a second input, 
and n=1 to N select means, wherein a first logic level presented to n^ select means couples the n m output 
to the n* 1 first input and a second logic level presented to the n 01 select means couples the n* output to 
the second input, and wherein the second portion of the presenting means is coupled to the select means 
of the second multiplexer; 

second means (701) for supplying n=1 to N values of the first vector to the n=1 to N first inputs of 
the second multiplexer; 

second accumulator means (713) coupled to the second multiplexer output for adding together val- 
ues of the first vector transferred to the outputs of the second multiplexer to provide a second sum (717); 
and 

means (720) for combining the first (716) and second (717) sums to produce a result (721) used 
for determining which input vector (702) gives the closest match to target speech. 

An apparatus for CELP coding speech using a combination of a first vector V(n) having values identified 
by index n running from n=1 to N. and a set of second vectors S k (n) wherein each of the second vectors 
is identified by index k and wherein each of the second vectors has up to N values which are either zero 
or non-zero and are identified by index n from n=1 to N, comprising: 

means (704, 706) for identifying indices n kt , of S k (n) for different k wherein S k (n,) are non-zero; 

means (712, 721) for adding values of the V(n) corresponding to indices n kt , to form sums Q(k); 

means (714) for identifying the value k=j corresponding to the largest value Q(k=j); and 

means (228) for synthesizing speech using S^n). 

The apparatus of claim 1 2 wherein successive vectors of the set of second vectors are determined by over- 
lap of the preceding second vector according to an overlap amount Ak,An, wherein the means for identi- 
fying and adding comprise: 

means (705, 707) for identifying for k=1 indices n 1tl of S k (n) wherein S^n,) are non-zero: 
means (705, 707) for determining further indices n M for k>1 wherein S k (n P ) are non-zero, starting 
from n 1|t and using the overlap amount Ak,An; and 

means (713, 721) for adding values of the V(n) for such indices and further indices to form sums 

Q(k). 

The apparatus of claim 13 wherein the means for identifying, determining and adding comprise, means 
for identifying for te2, a first index n kr not previously identified wherein S k (n r ) is non-zero, means for de- 
termining still further indices n^ for k^3 wherein S k (n r ) are non-zero starting from index n kfi - and using 
the overlap amount, and means for adding values of V(n) for such still further indices to further form sums 
Q(k). 

A method for CELP coding speech by combining a first vector (701) with a set of second vectors identified 
by index k, wherein the first and second vectors have values identified by indices n running from n=1 to 
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N, comprising: 

providing an N by N first multiplexer (704) having n=1 to N outputs. n=1 to N first inputs, second 
inputs, and n=1 to N select means, wherein a first logic level presented to n m select means couples the 
output to the first input and a second logic level presented to the n* select means couples the n m 
output to the second input; 

supplying n=1 to N values of the first vector (701) to the n=1 to N first inputs of the first multiplexer; 

presenting n=1 to N values of the second vector (702) of index k=1 to n=1 to N select means of the 
first multiplexer (704), the second vector providing at the n=1 to N select means the first logic level few- 
some values of n and the second logic level for other values of n; 

adding together values of the first vector coupled to the first multiplexer output to provide a sum; 

repeating the presenting and adding steps for further values of k; and 

synthesizing speech based on whichever sum identifies a second vector giving the closest match 
to target speech. 

The method of claim 15 wherein the second vectors have values 0, +1,-1, and wherein the step of pre- 
senong the n=1 to N values of the second vector comprises presenting, a first portion having entries 0, 1 
corresponding to the locations of values of 0, +1 of the second vectors to the first multiplexer and present- 
ing a second portion having values 0, 1 corresponding to the locations of values 0,-1 of the second vectors 
to a second multiplexer like the first multiplexer and responsive to the first input vector at inputs thereof 
and the second portion of the second vectors at select means thereof, in the same manner as the first 
multiplexer, 

adding together values of the first vector coupled to the outputs of the first multiplexer to provide a 
first sum, and adding together values of the first vector coupled to outputs of the second multiplexer to 
provide a second sum; 

combining the first and second sums to provide an output useful for determining which input vector 
gives the closest match to target speech. 

A method for CELP coding speech using a combination of a first vector V(n) having vaJues identified by 
index n running from n=1 to N, and a set of second vectors S k (n) wherein each of the second, vectors is 
identified by index k and wherein each of the second vectors has up to N values which are either zero or 
non-zero and are identified by index n from n=1 to N, comprising: 

identifying indices n ktJ of S k (n) for different k wherein S k (nj) are non-zero; 

adding values of the V(n) corresponding to indices n^ to form sums Q(k); 

identifying the value k=j corresponding to the largest value Q(k=j); and 

synthesizing speech using S^n). 

The method of claim 17 wherein successive vectors of the set of second vectors are determined by overlap 
of the preceding second vector according to an overlap amount Ak,An. wherein the identifying and adding 
steps comprise: 

identifying for k=1 indices n 1(I of S k (n) wherein S^ni) are non-zero: 

starting from n 1ri and using the overlap amount Ak, An, determining further indices n k(K for k>1 where- 
in S k (n r ) are non-zero; and 

adding values of the V(n) for such indices and further indices to form sums Q(k). 

The method of claim 18 further comprising identifying for k^2, a first index n k(t - not previously identified 
wherein S k (n r ) is non-zero, and then, starting from index n kr determining still further indices n k#r for fe3 
wherein S k (n r ) are non-zero using the overlap amount, and adding values of V(n) for such still further in- 
dices to further form sums Q(k). 
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